[00:35:25] <maumercado> that have not being inserted yet
[00:35:46] <maumercado> kind of like a findModify but if it finds and object, then it should not modify nor insert
[01:39:59] <luminux> Hi, I’m new to version 3. Is it still necessary to shard a database to allow better write performance on multicore systems?
[06:35:42] <roelof> If I have this code (http://lpaste.net/134076 ) why do the last 6 entries not get sorted ?
[10:14:11] <donguston> I have thousands of JSON files that I have exported from a websites API and I want to turn them into a database that I can query to output stuff in the format that I require. Some of them have syntax errors. Is there a way for MongoDB to read them all? Is there anyhthing I can use to attempt to automate the syntax fixes
[13:06:13] <deathanchor> yeah I believe StephenLynx is correct
[13:08:22] <StephenLynx> but DennisGer1 if you are hard hard coding indexes of arrays, probably something is wrong.
[13:08:36] <StephenLynx> if you are storing it like that.
[13:09:10] <deathanchor> StephenLynx: but I want the 5th member of all the boy bands in my DB.
[13:10:02] <StephenLynx> I can understand when its something like I had to face recently, with generated HTML pages and forms, where your options are limited. but as soon as that 'something-id' hit the application, you should store each part in their proper fields.
[13:12:37] <cheeser> deathanchor: Zayn Malik is gone, man. he quit. let him go.
[13:14:15] <DennisGer1> thanks, let me try StephenLynx.
[13:18:30] <DennisGer1> not working as expected, Not possible to query projection like this : arrayname.subarrayname[index]
[13:44:01] <DennisGer1> db.getCollection('serverfarm1').find( {}, {"servers.server.1":1}) is not returning all the sub arrays ...only till mainboard level
[13:56:50] <DennisGer1> still not getting all levels....but thanks so far Stephen. I think I really need to redesign the JSON structure
[14:48:36] <gabrielsch> is there any way to query and retrieve nested collections? here's my "schema": https://gist.github.com/gabrielsch/170736a9682915e63fed
[14:48:44] <gabrielsch> I want to find supplies by productId
[14:50:16] <GothAlice> gabrielsch: Not the way you've structured your data, no.
[14:51:09] <GothAlice> And no ability to index that means a world of pain.
[14:51:12] <saml> in your program, query = {}; query['supplies.'+id+'.productId'] = id
[14:51:33] <saml> your data is weird. rethink. and think out side of the box
[14:51:39] <GothAlice> Instead of {supplies: {some_id: {productId: some_id, …}, …}} if you pivot your data like this, it becomes actually usable: {supplies: [{productId: some_id, …}, …]}
[14:52:05] <GothAlice> db.Supplier.find({"supplies.productId": some_id}) < and you can index on supplies.productId to make this efficient.
[14:52:07] <saml> exactly. and use $setadd or something
[14:54:41] <saml> and each supply document will have an id of supplier
[14:55:04] <GothAlice> It more comes down to data growth (growing records up can be expensive) and patterns of use. Will you always want the parent record's data when getting a single child? If you delete the parent, will you want all children cleaned up automatically? Etc. The article I linked explores a few of the criteria for embedding.
[14:55:06] <saml> db.supplies and db.suppliers but then if you need suppliers data... you might want to denormalize
[14:56:08] <saml> gabrielsch, number of suppliers would be much less than supplies?
[14:56:34] <gabrielsch> saml: for example. 1 supplier supplies N products, you know?
[14:56:37] <saml> how big is each supplier's data?
[14:56:54] <saml> what kind of supplier's data do you want to be joined/coupled with each supply?
[14:56:58] <gabrielsch> saml: it's small, only supplies collection can be big
[14:57:34] <gabrielsch> saml: I think nothing, only reference which are my supplies
[14:57:52] <gabrielsch> saml: when I designed this, I thought that supply is an internal detail of supplier
[14:57:59] <saml> so, I think this makes sense for you: db.supplies {_id: productId, supplier: {name: 'webscale', address: '1st west nyc'}, some other product info}
[14:58:01] <gabrielsch> but then I needed to query supplies from productId :(
[14:58:32] <saml> in that case, {_id:productId, supplier:supplierId, some other product info}
[15:00:10] <saml> it's gonna be a problem.. if you use json serialization
[15:00:33] <GothAlice> Uhm, like any database I hope you aren't trusting user-supplied data for your query source.
[15:00:33] <saml> in some of your micro web scale services that might generate queries
[15:00:49] <nawadanp> There is two commands to manage the balancing of a collection : sh.disableBalancing(namespace) and sh.enableBalancing(namespace). But how can I see if the balancing is enable or not on this collections ?
[15:00:57] <GothAlice> I.e. your application code should be constructing the query, and not through JSON serialization. ;P
[15:01:41] <GothAlice> nawadanp: Ref: the Databases and Sharded Collection sections of http://docs.mongodb.org/manual/reference/method/sh.status/
[15:02:57] <gabrielsch> saml: a product is unique for each supplier
[15:03:33] <nawadanp> Also, when I disable the balancing on a collection, how can I be sure that there isn't any balancer process on it ? Currently, I just check that this return nothing : config.locks.find({ '_id': namespace, "state": 2}).count()
[15:04:27] <nawadanp> GothAlice, Thanks ! I will check that
[15:05:15] <saml> gabrielsch, so.. maybe db.products and each product is tagged with one or more supplierIds
[15:05:39] <saml> im' just guessing around. don't listen to me. but think out side the box
[15:11:17] <GothAlice> gabrielsch: It's also very important to remember that having truly dynamic field names (i.e. {foo: {"27": {some_id: "27", …}}} — "27" is being used as a field name here) is in general a very bad way to store data, as you can't index it, you can't search across several nested values at once, only one at a time, etc., etc.
[15:12:14] <nawadanp> About the s.status(), I expected a command more specific, like for example getBalancingStatus(namespace)
[15:12:36] <nawadanp> Because sh.status() take few minutes to execute...
[15:23:05] <lllama> hello all. I'm upserting a bunch of docs from node but I'm closing my connection before they've all been written.
[15:23:36] <lllama> Is there a 'drain' event I can listen for, or similar? (using something like that with postgres).
[15:44:05] <GothAlice> lllama: If your JS code is async, you'll need to be careful of callback hell, and make sure you only let your app actually quit when all operations are complte.
[15:47:02] <GothAlice> lllama: Given the client drivers do connection pooling, closing connections seems excessive. ;)
[15:47:47] <lllama> GothAlice: my inserts are being done from a script, so I want the program to exit once I'm done.
[15:48:10] <lllama> GothAlice: i.e. it's not a server process or similar that will hang around.
[15:48:44] <GothAlice> In those scenarios I still don't explicitly close. The connection will be cleaned up when the script exits, regardless. (Not sure if Node has an "atexit" callback registry, though.)
[15:59:38] <deathanchor> you only need to exhaust cursor that you explicitly say never to timeout, other than that you don't need to close anything
[17:22:15] <deathanchor> anyone else here use tokumx?
[17:24:52] <deathanchor> tokumx doesn't seem to obey the secondaryPreferred like mongodb does.
[17:25:14] <deathanchor> wondered if anyone else has that issue
[17:25:40] <deathanchor> tokumx version I am using most closely resembles mongo 2.6
[17:25:50] <StephenLynx> funny thing, "but it replaces 1970s B-tree indexing with modern Fractal Tree® indexing technology." why doesn't mongo implement this on their engine?
[17:26:05] <StephenLynx> or their stats are bullshit.
[17:26:19] <StephenLynx> "50x performance improvements 90% reduction in database size"
[17:26:32] <deathanchor> so tokumx is great for compression, yeah about 90%
[17:26:46] <deathanchor> it's super fast for some things.. not all things
[17:27:34] <StephenLynx> does it does something worse than vanilla mongo?
[17:45:25] <deathanchor> I use ansible for now, we may be forced to use some other paid software in the future, but for now everything is too complex for our needs where ansible fils that role easily.
[17:46:17] <deathanchor> I'm interested in UrbanCode
[17:53:31] <bjorn`> Say I wanted to store sensor data (or any data for that matter) dumped once a minute, but I have no use of down-to-the-minute data weeks back, say I need minute data for the last day, per-hour for the last month and bi-daily for everything before that; What would be the best way to design this? I've though of separate collections for each time series, but that seems a bit unflexible. Better suggestions? A
[17:53:37] <bjorn`> worker that cleans the unnecessary records once a day?
[17:54:37] <Spec> statsd/collectd sounds like the right tool for that job :p
[17:54:55] <StephenLynx> " cleans the unnecessary " you could just use expiration.
[17:55:08] <deathanchor> bjorn`: GothAlice has a good article on that
[19:32:02] <cheeser> afaict, that DB reference in the URL is largely for certain systems to be able to configure which database to deal with by inspecting the URL which can come from, say, stage specific config files (dev vs. staging vs. prod_
[19:32:11] <cheeser> MongoClient itself just connects to a host.
[19:33:09] <crised_> cheeser: oh ok, I don't have to worry about reconnects or anything, that should be done under the hood, right?
[19:34:21] <GothAlice> Additionally, the db provided in the URL is used as the authenticationDatabase for URI-based configs.
[19:34:33] <GothAlice> Thus you can include credentials in the URL, then switch to another database later.
[19:38:53] <cheeser> crised_: right. you might see some churn during a reelection but once the dust settles the driver will pick up the new master for you
[20:02:33] <deathanchor> weird... profiler writes to magic collections on secondaries, they vanish once you switch dbs.
[20:04:03] <deathanchor> so I'm having a strange time trying to get a distinct query to run on secondaries, but it seems to keep running on the primaries no matter what.
[20:06:22] <deathanchor> well this sucks: https://jira.mongodb.org/browse/JAVA-570
[20:08:59] <StephenLynx> any particular reason you are using an outdated driver?
[20:09:33] <StephenLynx> aren't you the one with the issue?
[20:09:40] <StephenLynx> "<deathanchor> so I'm having a strange time trying to get a distinct query to run on secondaries, but it seems to keep running on the primaries no matter what."
[20:11:18] <deathanchor> so yes, working with a dev who is doing the code changes. I requested the change because a user clicks something which triggers this distinct query, which locks up the DB, which backlogs a bunch of stuff which is my job :)
[20:11:52] <deathanchor> basically I wanted the distinct moved to secondary so it doesn't stop write ops
[20:12:15] <StephenLynx> well, so just tell him to stop using old dependencies.
[20:12:18] <deathanchor> in qa we couldn't get it to run the secondary via the code.
[21:54:43] <Owner_> must specificy database and collection to print to stdout
[21:54:58] <Owner_> derp, so i guess i wont use that
[22:09:17] <Doyle> Hey. Is there an upper limit to the size of the DB where performance becomes unmaintanable?
[22:10:05] <Doyle> I'm looking at the MongoDB at Scale page and I have to assume that some of these companies have hundreds of TB in their sets.
[22:17:44] <crised_> I'd like to have 2 rows as primary key, one is an number id, and the other is a date timestamp, the number id gets repeated through time many times. Can I do complex queries on attributes only on the last occurence of each id?