[00:31:06] <wspider> how could I update the value of a parameter in all documents matching a query and create the parameter in case that it doens't exists in certain document that match the query?
[00:43:48] <cheeser> the field will be created by the update operation
[00:44:10] <cheeser> e.g., $inc on a field that doesn't exist will create that field with a value of 0 and then apply the $inc
[03:44:02] <GothAlice> MongoEngine versions that work are pinned to older versions of Pymongo which are not compatible with the current server versions.
[03:44:07] <GothAlice> Another _huge_ reason to avoid MongoEngine like the plague.
[03:44:39] <SimpleName> but it’s a base character of mongodb 3.x
[03:55:34] <SimpleName> GothAlice: but I want to use model like this: http://paste.ofcode.org/eZ9Evu3JyW4qtc9RWsafn9
[03:55:55] <SimpleName> the app post a json, I can save or check it easy
[03:57:01] <GothAlice> SimpleName: marrow.mongo is what I'm writing to replace MongoEngine. It is not an Active Record Mapper, i.e. the "Document" class does not implement the equivalent of MongoEngine's "objects" attribute, save() method, etc.
[03:57:23] <GothAlice> What it _does_ do is provide an "attribute access dictionary" and ability to define a schema for the same. Document instances can be passed directly to pymongo's own insert_one, etc. methods.
[03:57:49] <SimpleName> you write for youself use?
[03:58:02] <GothAlice> My eventual goal is to have this construct MongDB validation documents for the defined schemas, too. I.e. make full use of MongoDB features, instead of wrapping everything and hampering use.
[03:58:18] <GothAlice> m.mongo extends https://github.com/marrow/schema < which is > 100% tested.
[03:59:20] <GothAlice> (Earlier versions had more tests than lines of code to test, the current stable version is just under: https://travis-ci.org/marrow/schema/jobs/111391245#L281-L291 ;)
[04:07:38] <GothAlice> https://gist.github.com/amcgregor/6ddbda735e6ded267d31 compares MongoEngine and marrow.mongo querying approaches, too.
[04:08:16] <GothAlice> (Though currently PyMongo doesn't directly accept Ops instances… still haven't figured that one out, thus the "as_query" property.)
[04:23:38] <GothAlice> SimpleName: https://github.com/marrow/mongo/blob/develop/marrow/mongo/query/djangoish.py < this contains an example of MongoEngine/Django-like querying, too. (Simple variation, produces Ops instances like the more SQLAlchemy-like querying by class attribute.) "Djanglish". :)
[06:31:06] <SimpleName> GothAlice: do you still update this project
[06:31:30] <GothAlice> I do. I'm actually working on it right now. :) (Eliminating the distinction between Op and Ops to try to avoid needing the .as_query property.)
[06:33:32] <SimpleName> GothAlice: ;) let me play with you
[06:33:44] <GothAlice> All feedback is greatly appreciated. :D
[06:34:16] <SimpleName> :) My company write 10000000 document one day, so I think mongoengine may consume too much memory
[06:34:35] <SimpleName> just use mongodb validator
[06:35:12] <GothAlice> Let the server handle that validation for you, then it won't matter how you connect (pymongo or mongo shell), the validation will be used.
[06:36:11] <GothAlice> Sorry, /me use (which puts your name before what you type after /me) is used to "emote", so there I was agreeing with you by pretending to have a head, and nodding it yes. :)
[06:37:20] <SimpleName> SimpleName: like this right?
[06:41:02] <SimpleName> GothAlice: have you play postgresql, is it more suit for large data
[06:41:46] <GothAlice> Both MongoDB (document storage) and Postgres (relational storage) are suited to their own type of data.
[06:42:12] <GothAlice> Similarly, neither handle "graph data" very well, so there are other graph databases like Neo4J which handle that type of data better.
[06:42:51] <SimpleName> my firend said postgresql maybe more better for 1000,0000 one day
[06:42:55] <GothAlice> In all things it's best to use the right tool for the job. :) If you need flexibility, MongoDB can be good. If you need highly relational data, or transactions, Postgres might be a better idea.
[06:43:20] <GothAlice> The number of zeros you're using don't match up with where you're putting that comma, making it seem like an exaggeration. ;P
[06:44:05] <GothAlice> MongoDB, used correctly, can efficiently store terabytes of data and process millions of requests. I tested MongoDB out a few years back and got 1.9 million inserted records per second with just one server.
[06:44:38] <GothAlice> So a million in a day should be "easy" by comparison.
[06:46:41] <SimpleName> yes, we do a video aggreation, and analytics user by they click or saw video
[06:46:55] <SimpleName> everyday ,will generate ten million data
[06:47:17] <GothAlice> We use MongoDB for analytics and primary data storage at work. The trick to using MongoDB efficiently for analytics is a concept called "pre-aggregation".
[06:47:31] <SimpleName> and I need to analytics it ,
[06:47:54] <SimpleName> do you use the marrow/schema for your work
[06:47:56] <GothAlice> http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework < this explains the difference between different methods of storing event-based data, in terms of storage, query performance, and granularity.
[06:48:06] <SimpleName> ;) I think I can follow the step of you
[06:48:11] <GothAlice> Quite a bit, yes. We actually use marrow.schema to define our controllers, not just data models.
[06:48:35] <GothAlice> m.schema is a true generic schema system, letting you define flexible structures for almost anything.
[06:49:12] <SimpleName> so just use m.schema define my model, instead of pymongo, right? I think no model is no design
[06:49:33] <GothAlice> PyMongo doesn't really define schemas at all.
[06:50:33] <SimpleName> in your job, you use what define your mongodb model,?
[06:50:53] <GothAlice> marrow.mongo, which uses marrow.schema. We used to use MongoEngine, but not any more.
[06:51:51] <SimpleName> :D I decide to use marrow/mongo too, because, mongoengine is suck
[06:52:15] <SimpleName> ;) And I can learn more from you
[06:52:17] <GothAlice> I will note that marrow.mongo is currently not "stable", it's a work in progress. Much of it is functional, though, like the Document structure.
[06:52:40] <SimpleName> :( so there is lots of bug
[06:55:21] <GothAlice> I'm fairly careful to avoid bugs; there are areas where it's not entirely complete.
[06:55:51] <GothAlice> marrow.schema, as I mentioned earlier, has complete test coverage; it's guaranteed to work as intended.
[06:56:56] <SimpleName> Could not find a version that satisfies the requirement marrow.mongo (from versions: )
[06:57:50] <GothAlice> SimpleName: Because it's not 100% ready for use, there is no released version. You'll need to check out the source from Github to try it out.
[06:57:56] <GothAlice> I do not release software that is incomplete. :)
[06:58:25] <SimpleName> :( so I have to use mongoengine now untill you release it
[06:58:29] <GothAlice> (Once it is released, this means you won't ever have to worry about updates breaking your code.)
[06:58:36] <SimpleName> because I write this project for my company
[06:58:57] <SimpleName> more stable is very important
[06:59:20] <GothAlice> Hmm; that's reasonable, though not technically a limitation. You can install using pip from git. I'd still have a look into marrow.schema; it is released and very stable.
[07:00:28] <GothAlice> I wouldn't recommend using MongoEngine at all, though. Start with just pymongo.
[07:00:39] <GothAlice> MongoEngine is not stable at all.
[07:01:03] <GothAlice> https://github.com/marrow/contentment/issues/12 < see all of these tickets
[07:01:07] <SimpleName> what’s the difference between marrow.schema and marrow.mongo
[07:02:02] <GothAlice> marrow.schema is just a declarative schema system. It is not specialized to any use, but instead, general to apply to any use. It provides the underlying "declarative schema" (using classes to define data structures) system, plus validation and data transformation.
[07:02:35] <GothAlice> marrow.mongo is the specialization of marrow.schema to interoperate with pymongo, allow for querying using field references, etc.
[07:06:27] <SimpleName> GothAlice: marrow.mongo, not only define model, and validator. so is there any other good function
[07:07:37] <GothAlice> I'm sorry, I don't understand the question. m.mongo extends m.schema to more easily work with pymongo. It also provides some helper functions, like an easier way to use capped collections, query using a Django/MongoEngine-style, etc.
[07:09:41] <GothAlice> Have a look around the code. Nothing it does is magic. :)
[07:10:11] <SimpleName> ok, let me spend some time look your source code
[07:16:52] <SimpleName> so, now just install Development Version? right ? GothAlice
[07:17:01] <SimpleName> cd mongo; git pull; python setup.py develop
[07:19:29] <SimpleName> do you use mongodb validator, or you juse python code validator
[07:24:51] <GothAlice> SimpleName: Use pymongo to connect. I use MongoDB validation documents.
[07:25:26] <GothAlice> Remember: unlike MongoEngine, marrow.mongo does not try to wrap or replace everything pymongo provides. Instead, it encourages you to use pymongo and provides utilities that work with pymongo.
[07:26:31] <GothAlice> Also, pushed some changes, including that Op/Ops simplification. Now there's even less code. :)
[07:34:48] <SimpleName> GothAlice: I will spend one day to read your code and test, because I need to take care my company code, And today and tomorrow i am read, so it’s have enough time learn from you
[07:35:30] <GothAlice> No worries; I understand completely. I'll be working on it all tonight and adding many tests, so it should become more obviously stable in the next 24h, too. :)
[15:00:53] <AlmightyOatmeal> i'm brand new to mongodb and i was wondering if someone would be as so kind to help me figure out a query? i'm taking results from elasticsearch and storing them in mongo for analysis using pandas and so far i'm successful in iterating arrays, the unique data structure has left me a bit confused -- here is some information on the model and my current aggregation query: http://pastebin.com/8iD2imEr
[15:02:28] <AlmightyOatmeal> within an object there is an array that contains values that correspond with values within the same object that the array is in but i can't figure out how to iterate over the array and grab the value from the object. i reference $_source.$_properties but i get scolded that i cannot have a reference/name with a period in it when i try to get the value of the property :(
[17:46:12] <BadApe> hi, i've used couchdb before and i used to test out map reduce functions in an editor, is there something like that for mongo?
[22:06:18] <MacWinner> if you convert a replica set to a sharded cluster (with a single shard). Can application continue to connect to the replica set directly without going through mongos?
[22:06:51] <MacWinner> basically I want to phase in the mongos into the architecture.. i want to move to the sharding architecture pre-emptively before it get's too late
[22:39:24] <cheeser> you should connect through mongos
[23:28:40] <MacWinner> cheeser, so after I add the replica as a shard, is the replicaset modified in some way where it won't be able to be connected to like a normal replicaset? I don't plan on sharding any collection currently. Just want to introduce the config server and mongos to my architecture and let it burn in for a bit
[23:29:06] <MacWinner> ie, if I wanted to remove the config servers and mongos and jus tleave the original replica set, would that be possible?
[23:46:34] <MacWinner> awesome, thank you.. any obvious gotchas?
[23:52:27] <cheeser> once you start balancing shards you'll need to consolidate. and once balancing starts, connecting directly to the replSet is dangerous.
[23:52:54] <cheeser> mongod doesnt' currently track chunk status so you could find stale data because mongod doesn't know that some docs now live on other nodes.
[23:58:09] <MacWinner> cool.. so is it the case that a replica set is not really aware of the existance of sharded cluster? the sharding, chuking and routing is managed completely outside of the replset it in the config server and mongos?
[23:58:57] <MacWinner> conceptually mean.. like a brick is just part of a building.. it doesn't really know how it got to where it is.. it just does what it's supposed to do