PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 17th of November, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:52:04] <dusky_> Hello, anyone can help me in a single question ?
[00:52:19] <joannac> It would help if you just asked the question
[00:53:11] <dusky_> i`m new in nosql, readed the doc but dont could make an single query
[00:53:28] <dusky_> I have a json import in a collection, like these: "itens":{ "Sticker | Banana":{ "value":1 }, "Sticker | Bash (Holo)":{ "value":2.46 }, "Sticker | Bish (Holo)":{ "value":2.15 }
[00:53:45] <dusky_> How can i get the value of a item named "Sticker | Banana"
[00:54:01] <dusky_> db.itens.find("Sticker | Banana")
[00:54:14] <dusky_> sorry for my english!
[00:55:34] <compeman> dusky_
[00:55:39] <dusky_> yes
[00:56:00] <compeman> dusky_ use name:"blabla",value:"1"
[00:56:06] <compeman> for storing your data
[00:56:28] <dusky_> nice, and for get a value of a name ?
[00:56:29] <compeman> this is the first time i see such a document
[00:57:10] <compeman> then you can find your document via db.items.find({name:"blabla"})
[00:57:53] <dusky_> will try
[00:58:19] <compeman> then you can do other stuffs to get the value output
[00:58:26] <compeman> this will output the whole document
[00:59:12] <dusky_> dont find anything
[00:59:19] <dusky_> but i dont have a attribute name
[00:59:49] <dusky_> can i post a link here to a SS on img.ur ?
[01:00:58] <compeman> dusky_ as i said, use attribute's for each data in your document. this is weird
[01:01:06] <dusky_> imgur.com/2t4gVkV
[01:01:17] <dusky_> i think i dont import with an attribute
[01:01:44] <dusky_> for each row imported i have an atribue named "value"
[01:02:50] <dusky_> look these json, i think it is incorrect and i need to make an attribute like itemName for each item ?
[01:12:00] <dusky_> i have these api thar return a json for me, http://api.ncla.me/itemlist.php, how can i format and add the attribute itemName before each item, example: "item"Sticker | Banana":{ "value":1 },
[01:12:27] <dusky_> "itemName":"Sticker | Banana":{ "value":1 }
[01:16:50] <fxmulder_r_u> so I have an index of which isp_domain is the first key, assuming there are less than 50 unique domains, $db->run_command( { distinct => "table", key => "isp_domain" } ); should return pretty much right away no?
[01:32:13] <fxmulder> it obviously isn't using the index and I'm not sure why
[04:29:51] <godzirra> Can anyone help me figure out what's going on? I'm trying to use "mongo --eval" and I'm having a lot of trouble. https://gist.github.com/slooker/85199345eab6d7419790
[04:43:06] <Boomtime> godzirra: bash weirdness, i am not sure what is going on, but it isn't the mongo shell
[04:43:15] <godzirra> Yeah, I just can't figure out what the deal is.
[05:29:22] <crk> is it viable to implement cookie based retention of my preferred mongodb version for the documentation pages?
[05:29:33] <crk> it's irksome to have to select 2.6 each time :(
[05:30:09] <crk> if this sounds like something that could be done, please direct me to where I must request it officially (bug tracker, request tracker etc.)
[05:32:03] <joannac> crk: just keep going to https://docs.mongodb.org/v2.6/ ?
[05:32:20] <joannac> rather than just https://docs.mongodb.org/manual/ (which goes to the current version)
[05:32:30] <crk> joannac, more often than not I find myself arriving at mongodb documentation through links from other pages
[05:32:38] <crk> I do make sure I save my link URLs with the full path (including version)
[05:33:09] <crk> so whenever I get there, I've to set the desired version first, before reading on. forgetting to do so that has led me to trouble a few times :P
[05:33:36] <crk> would it not be smarter if the documentation remember what version I was browsing last, and defaulted to it?
[05:33:56] <joannac> I can see both sides of this. I think it's fraught with danger if I can link you to something and not know if you actually see the same thing I do
[05:34:25] <crk> joannac: that's an angle I did not consider... hmm.
[05:34:48] <joannac> personally, it would drive me nuts. But I am not the typical user :)
[05:35:51] <joannac> crk: https://jira.mongodb.org/browse/DOCS-5355
[05:44:51] <crk> joannac: thanks for the prompt assistance :) I understand that this is something that needs more thought
[05:46:07] <joannac> crk: no problems, and I know your pain
[05:46:32] <joannac> feel free to add comments to the DOCS ticket
[05:46:34] <crk> even as a non-typical user?
[05:46:38] <crk> sure. I'll look into that.
[05:47:19] <joannac> yeah, for sure. having to switch versions all the time because one person is on 2.6 so needs 2.6 docs, and another is on 3.0 so needs 3.0 docs
[06:50:26] <agatoxd> ping
[08:32:13] <motaka2> Hello, should everyone migrate from mysql to mongodb ?
[08:36:40] <kali> nope
[09:03:44] <dvargek> Hi, does anybody have experience on failover of a sharded cluster?
[09:12:02] <kali> dvargek: failover is handled at the replica set level, the sharding stage should barely be aware that something happened
[10:20:17] <dvargek> kali: thanks for the response, I got a weird behaviour, when powering down one of the nodes inside my cluster
[10:21:28] <dvargek> I'm running a cluster with 3 shards, based on 3 replication sets with 3 separate mongos instances
[10:22:09] <dvargek> when I just stop the mongod and mongo-configsrv services on one of the machines, failover works as expected
[10:22:42] <dvargek> but when powering down the same machine, so the ip isn't available anymore, the remaining mongo behaves really slow
[10:23:01] <dvargek> login via mongos and queries take more than 20 seconds each
[12:57:50] <Rafibd01717> why should I use mongodb instead of traditional sql?
[12:58:03] <StephenLynx> I don't know.
[12:58:05] <StephenLynx> why?
[12:58:32] <StephenLynx> that really depends on your use-case.
[12:58:55] <StephenLynx> mongo doesn't replace relational databases, it just fits different use cases.
[12:59:03] <rom1504> Rafibd01717: if you don't want to define tables maybe
[12:59:23] <StephenLynx> if your use case requires a relational database, then you are better using a relational database.
[12:59:32] <Rafibd01717> mongos concept on documents actually is same as tables
[12:59:37] <StephenLynx> no.
[12:59:54] <Rafibd01717> why no?
[12:59:57] <StephenLynx> a table stores entries that all have the same structure.
[13:00:09] <StephenLynx> mongo stores documents with arbitrary fields.
[13:00:34] <StephenLynx> also, documents are allowed to have sub-documents
[13:00:54] <Rafibd01717> you mean arrays?
[13:00:56] <StephenLynx> no.
[13:01:09] <rom1504> there are no schema definition with mongo
[13:01:26] <rom1504> it's a good thing for some uses, and bad for others
[13:01:41] <StephenLynx> we won't be able to explain it to you, you will have to learn mongodb to see how it fits differently on use cases.
[13:01:45] <Rafibd01717> but what is the answer of my original question?
[13:01:59] <StephenLynx> we can't answer you because that depends on information you didn't gave us.
[13:02:12] <StephenLynx> which is: what you need from a database.
[13:02:14] <Rafibd01717> what are the benefits of mongodb over sql is the answer I guess
[13:02:31] <StephenLynx> what are the benefits of a banana over an apple?
[13:02:57] <Mattias> StephenLynx: It tastes better!
[13:03:02] <StephenLynx> >opinion
[13:03:05] <Rafibd01717> yes
[13:03:23] <Rafibd01717> and this is not something that I can answer during interview period
[13:03:55] <Rafibd01717> Actually I see nowadays nosql concept is becoming popular but always wondering what made it so popular over SQL
[13:03:57] <StephenLynx> again. mongo is not absolutely better or worse than relational databases.
[13:03:58] <rom1504> I answered this
[13:04:08] <StephenLynx> it is different and fits different use cases.
[13:04:23] <Rafibd01717> stephen for example?
[13:04:30] <StephenLynx> brb lynch
[13:04:32] <StephenLynx> lunch*
[13:04:35] <Mattias> Rafibd01717: You don't have to create a schema beforehand, you can just insert data however you want. And it handles JSON natively! So you can search it without problems. Try follow a mongodb tutorial or something.
[13:05:26] <Rafibd01717> Mattias: so schema is the one benefit
[13:05:28] <Rafibd01717> I mean no schema
[13:06:34] <rom1504> kind of. You just insert structured stuff instead of inserting tuples
[13:06:55] <rom1504> mongo is not the only way to do nosql though
[13:07:10] <Rafibd01717> I know but it is the leading one
[13:07:17] <rom1504> (for example there are graph databases, which are pretty different to mongo)
[13:07:19] <Rafibd01717> I mean the best among others
[13:07:46] <rom1504> not really, it depends on your use case
[13:07:58] <rom1504> you wouldn't really represent a graph in mongodb
[13:08:15] <rom1504> it may be the leading document store yes
[13:09:43] <Rafibd01717> well I have some questions regarding user management of MongoDB
[13:17:05] <Rafibd01717> should I create a user in "admin" database with "userAdminAnyDatabase" role to enable security in my mongodb server?
[13:17:25] <Rafibd01717> then rest other users will be created logging as this user?
[13:17:48] <Rafibd01717> I mean this user will be the super user who can do anything in the mongodb instance?
[13:34:46] <StephenLynx> back
[13:35:29] <StephenLynx> no, the lack of schema validation is not the only difference that can be beneficial. documents having sub-documents is very useful too.
[13:35:48] <StephenLynx> the ability to add more servers to a cluster without having to turn the db off
[13:35:55] <StephenLynx> and the ease you can do so.
[13:36:11] <StephenLynx> also, mongo has better performance when dealing with larger datasets.
[13:57:15] <dddh> mtools can install several instances only on the same server?
[15:51:03] <repxxl> how can i sort multiple mongo cursors by date? i mean i have such a database schema that i cant get all with one query i use multiple collections
[15:54:45] <StephenLynx> just sort the cursors.
[15:55:12] <StephenLynx> since you can't operate with multiple collections at once, it makes no difference on your application how many cursors you are working with.
[15:55:24] <repxxl> StephenLynx yea but i need them be combined and sorted .. u know
[15:55:35] <StephenLynx> mongo can't do that.
[15:55:46] <StephenLynx> that is only done on application code.
[15:56:06] <repxxl> StephenLynx aaa alright
[15:56:41] <cheeser> i don't know any database that'd let you do that on multiple result sets
[15:56:51] <cheeser> you'd have to join them and sort the one result set.
[15:57:38] <StephenLynx> yeah, at most you could use sub-queries on sql, but your application still only gets one result set.
[16:00:46] <repxxl> btw, do you suggest me to keep _id the object id if i dont gonna use it ?
[16:01:08] <StephenLynx> afaik, you don't have a choice.
[16:08:36] <repxxl> can i make the MongoDate less precise ? to be it like the _id timestamp for example i dont need milimicroultra seconds
[16:08:49] <cheeser> you can't
[16:09:14] <repxxl> Cheeser also not on the application side what is even mongodate for a format ?
[16:09:27] <cheeser> though you can normalize them if you want. but the date object will still track those values even if you 0 them out
[16:10:30] <repxxl> cheeser u know because when i sort by _id i getting the correct order but when i sort by MongoDate i just dont get the right order because it is too precise
[16:14:29] <repxxl> cheeser also why is _id index not automatically by default "unique" indexed
[16:14:49] <StephenLynx> I think it is.
[16:14:50] <cheeser> it is
[16:15:40] <repxxl> when i run db.col.getIndexes() i dont see it
[16:15:40] <cheeser> being too precise *can't* result in the wrong order.
[16:17:04] <repxxl> the unique : true is missing
[16:17:21] <repxxl> or is it because im on older version of mongo idk ...
[16:18:00] <cheeser> it just doesn't show for whatever reason. but it's unique.
[16:18:42] <repxxl> cheeser ok so when i will set now my own unique ids to the _id field i dont need to createIndex with unique true right ? its automatically
[16:18:52] <repxxl> cheeser becasue i wanna change the _id to my custom unique ids
[16:19:10] <cheeser> you can't change existing _id values, fwiw
[16:19:55] <repxxl> cheeser oh i tought i can change the _id so its fixed i cant do anything with it right ?
[16:21:50] <StephenLynx> no, you will need to create a separate unique index.
[16:22:00] <StephenLynx> I do that often and completely ignore _id
[16:22:30] <repxxl> StephenLynx okey well :)
[16:22:47] <cheeser> _id should have no business meaning, imo
[16:23:24] <StephenLynx> yeah, you can only use it in rare occasions.
[16:23:45] <StephenLynx> when a meaningless identification string can be used and you have no other means of identification of something.
[16:59:29] <repxxl> does a integer 1 equal in bytes to a string "1" ?
[17:00:16] <cheeser> no
[17:11:24] <StephenLynx> I believe it would take more space.
[17:11:46] <cheeser> a string would yes
[17:11:55] <StephenLynx> the int
[17:12:02] <cheeser> you think?
[17:12:05] <StephenLynx> at least in C the int would take more space.
[17:12:23] <StephenLynx> since sizeof char returns 1 and sizeof int returns 8 I guess.
[17:12:25] <cheeser> the int is 32 bits. the string would be at least 2 ints maybe 5.
[17:12:43] <StephenLynx> no, the string would be made of chars, not ints.
[17:13:07] <cheeser> well, i mean it'd contain the same bit size
[17:13:55] <cheeser> strings are UTF-8 encoded, fwiw
[17:14:01] <StephenLynx> I managed to reduce RAM usage on my C code by using chars and shorts instead of ints.
[17:14:21] <StephenLynx> hm, yeah, I don't know how it would go with UTF
[17:14:36] <StephenLynx> what is the highest value for a single character on it?
[17:15:26] <cheeser> up to 4 bytes, iirc
[17:15:44] <repxxl> but a 12 long string will allways equal 12 bytes no ?
[17:15:57] <StephenLynx> not really
[17:16:07] <repxxl> or it depends what characters you use
[17:16:09] <cheeser> what is a 12 long string?
[17:16:18] <repxxl> 12 characters long string
[17:16:19] <StephenLynx> it depends on the encoding.
[17:16:20] <repxxl> sry
[17:16:39] <StephenLynx> for example
[17:16:51] <StephenLynx> if an unsigned char can go up to X
[17:16:59] <StephenLynx> and your encoding specifies more values than X
[17:17:06] <StephenLynx> then you won't be able to use char for your string
[17:17:14] <StephenLynx> and will have to use some type that is larger than chan.
[17:22:26] <repxxl> StephenLynx can i ask you something ? do you use for every ids on your project "the 12 byte long ObjectId" ?
[17:22:41] <StephenLynx> rarely.
[17:22:53] <StephenLynx> usually I define some human-friendly unique indexd.
[17:24:06] <StephenLynx> what are you concerned with?
[17:24:49] <repxxl> StephenLynx well, i reading this http://stackoverflow.com/questions/6645277/should-i-implement-auto-incrementing-in-mongodb speciall the answer "42" it starts with "I strongly disagree with author"
[17:25:50] <repxxl> StephenLynx so im concerned in future if my human-friendly unique indexes will not be too large the bytes and ram usage i mean. to do it efficient
[17:26:07] <StephenLynx> oh, that?
[17:26:20] <StephenLynx> it doesn't matter because you can't have documents without that field and that index.
[17:26:24] <yopp> um
[17:26:41] <StephenLynx> global auto-increments are bad, I must agree with that.
[17:26:46] <yopp> stupid question, is it possible to stepDown the master in "gracefull" way, so no queries are lost?
[17:27:03] <StephenLynx> you can't opt-out of _ids and they are guaranteed to be unique anyway.
[17:27:09] <repxxl> StephenLynx i use a random base64 string now but when it will be too long in bytes it will use also more RAM nope ?
[17:27:11] <yopp> Right now I'm seeing that mongos clients are disconnected when master stepped down
[17:27:35] <StephenLynx> there is no reason to use a random base64 string rather than the _id
[17:28:01] <StephenLynx> as I said, it doesn't matter how much RAM _id uses because you can't not use it.
[17:28:20] <StephenLynx> whatever you implement to substitute it, won't substitute, just add more overhead.
[17:28:50] <StephenLynx> the ONLY scenario where an auto-incremented field can be justified, IMO, is when it isn't global.
[17:28:54] <StephenLynx> for example:
[17:29:00] <StephenLynx> you are storing forum posts.
[17:29:07] <StephenLynx> and you have multiple sub-forums
[17:29:35] <StephenLynx> and posts have unique auto-incremented id by sub-forum, making it possible for two posts to have the same id, as long as they live in different forums
[17:29:35] <repxxl> StephenLynx yea i got you but wait you use _id like everyone and also a human readable string id like me but the human readable string will take too mutch RAM to get indexed you could only leave with the _id
[17:30:04] <StephenLynx> a base64 string is not human readable.
[17:30:16] <yopp> StephenLynx, incremental ids are crucial for ordering
[17:30:21] <StephenLynx> they are not.
[17:30:35] <StephenLynx> insertion date can do the same
[17:30:43] <StephenLynx> and _id allows you to retrieve that information.
[17:31:00] <StephenLynx> and natural sort is barely useful anyway.
[17:31:01] <yopp> you can guarantee that date is same everywhere
[17:31:07] <StephenLynx> you wot
[17:31:13] <StephenLynx> where everywhere?
[17:31:33] <repxxl> StephenLynx how big are your human readable id strings ?
[17:31:48] <StephenLynx> not too long, they are usually user logins, forum names
[17:31:48] <repxxl> StephenLynx i mean in length
[17:32:05] <yopp> on the hardware level, there no guarantee that each date sample will be unique, and that it will be the same on all servers
[17:32:07] <StephenLynx> I limit them to 16 or 32 characters usually, sometimes less.
[17:32:26] <StephenLynx> yopp what
[17:32:33] <StephenLynx> the _id won't change
[17:32:38] <StephenLynx> how can you get a different date?
[17:33:03] <yopp> you don't following: ordering _can't_ be guaranteed by the timestamp
[17:33:18] <repxxl> StephenLynx i have the same for users like you, for my "links" i use _id and a access_id which is something like "RHU5wt2upKMh" a 12 long base64 string which is also indexed
[17:33:46] <StephenLynx> yopp with _id I am pretty sure it is obtained by a single machine.
[17:34:03] <StephenLynx> sounds redundant, repxxl
[17:34:24] <StephenLynx> why the base64 string?
[17:34:25] <repxxl> StephenLynx you mean the length of the string ?
[17:34:36] <StephenLynx> no, unreadable and arbitrary strings.
[17:34:45] <repxxl> StephenLynx what do you suggest me ?
[17:34:50] <repxxl> StephenLynx instead of
[17:35:10] <StephenLynx> if ALL of the identifiers for something are arbitrary and unreadable, identify by the _id.
[17:35:19] <StephenLynx> period.
[17:35:30] <StephenLynx> why do you need access_id?
[17:36:12] <repxxl> StephenLynx i have something like mypage.com/RHU5wt2upKMh to access my links
[17:36:13] <yopp> StephenLynx, yeah, sure.
[17:36:35] <repxxl> StephenLynx because the _id seemed to me too long
[17:36:44] <yopp> StephenLynx, you have sharded mongo, with 2 servers, you're inserted a document. What exact timestamp you will get?
[17:36:45] <StephenLynx> it doesn't matter
[17:36:57] <StephenLynx> because no one is typing RHU5wt2upKMh
[17:37:00] <StephenLynx> NO ONE
[17:37:07] <StephenLynx> it could be 256 characters
[17:37:12] <StephenLynx> it doesn't make a single difference.
[17:37:23] <StephenLynx> because no one is typing RHU5wt2upKMh
[17:37:25] <repxxl> StephenLynx is true ...
[17:37:46] <repxxl> StephenLynx but if someone is going to share the link on skype or etc looks somekind of crazy the link or ? :D
[17:37:48] <StephenLynx> yopp I am not sure, but I have a guess it will be obtained on a single machine.
[17:37:55] <yopp> StephenLynx, it is not
[17:38:04] <StephenLynx> doesn't look any less crazy than RHU5wt2upKMh
[17:38:10] <StephenLynx> yopp got a source for that?
[17:38:10] <repxxl> :D
[17:38:28] <StephenLynx> you could look at facebook links
[17:38:31] <StephenLynx> they look crazy as hell.
[17:40:30] <yopp> repxxl, generally you should not care about the id's in the urls. Nobody cares. Major drawback of the ObjectId is the size. If you have a lot of small records (it's kinda where mongo sucks a lot, even with WT), you might have storage overhead for the _id field and index of the size of the data.
[17:40:48] <StephenLynx> yopp https://docs.mongodb.org/manual/faq/sharding/#how-does-mongodb-ensure-unique-id-field-values-when-using-a-shard-key-other-than-id
[17:40:56] <StephenLynx> it seems you can shard by _id
[17:40:59] <StephenLynx> and mongo will handle that.
[17:41:34] <yopp> StephenLynx, you are still missing the point. ObjectId _can't_ guarantee the order of the records by design.
[17:41:51] <StephenLynx> ok, but if you can guarantee by implementation
[17:41:56] <StephenLynx> what problem remains?
[17:42:06] <yopp> With objectid you just can't.
[17:42:20] <repxxl> yopp and StephenLynx alright i going with the objectid i will use other ids formats really only for human readble ids like "usernames" for example ty for help :)
[17:42:21] <StephenLynx> and even then, natural order isn't useful.
[17:42:45] <StephenLynx> I can't remember a single time where I needed documents by their insertion order.
[17:42:58] <yopp> StephenLynx, basic example: trading
[17:43:18] <StephenLynx> you keep track using application code in a separate field.
[17:43:27] <StephenLynx> so you can make sure stuff like timezones are taken in account.
[17:43:37] <StephenLynx> and besides
[17:43:38] <yopp> oh.
[17:43:41] <StephenLynx> only the order is not useful
[17:43:47] <StephenLynx> you want the exact time.
[17:44:03] <StephenLynx> only the data it was inserted after or before something is useless.
[17:44:22] <yopp> you can't reliably work with time on the common hardware even on the milliseconds scale
[17:44:42] <StephenLynx> I can get nano seconds precision.
[17:44:52] <StephenLynx> milliseconds are HUGE.
[17:45:09] <StephenLynx> and if you have a problem with the hardware
[17:45:15] <StephenLynx> then the software is not your problem.
[17:45:45] <yopp> StephenLynx, it's very complicated task to keep two separate hardware clocks in sync even on ms scale.
[17:46:09] <StephenLynx> thats why you use a single computer for that.
[17:46:36] <StephenLynx> what is your point, after all?
[17:47:03] <yopp> ObjectId is fine for many tasks, but not for all of them.
[17:47:11] <yopp> It's not a magic silver bullet
[17:47:13] <StephenLynx> of course they are not.
[17:47:16] <StephenLynx> nothing is.
[17:47:31] <StephenLynx> if you want, lets say
[17:47:38] <StephenLynx> do some space shit on CERN
[17:47:53] <StephenLynx> regular hardware and software won't cut
[17:48:27] <StephenLynx> but I honestly can't think of a common business scenario where the limitations you pointed are relevant.
[17:49:08] <yopp> As I said before: almost anything related to the money.
[17:49:14] <StephenLynx> kek
[17:49:34] <StephenLynx> what is your experience with financial software?
[17:51:11] <Owner_> hello
[17:51:29] <repxxl> one more question, when i have a field "type" : "real" and "type" : "unreal", now i want to find the "real" documents and i have 1milion documents there are saved only 2 indexes in the RAM since there are only 2 different values "real" and "unreal" or 1 milion indexes will be saved to RAM ?
[17:51:43] <yopp> StephenLynx, I've engineered couple. Sold soul to the devil: ad networks.
[17:52:05] <Owner_> if you are connected to a database, but using the oplog, do you only recieve updates for that database?
[17:52:21] <Owner_> like from application standpoint
[18:01:13] <yopp> repxxl, 1 million index records in this case
[18:02:15] <yopp> repxxl, basically, size of that index will be pretty much the same as _id index size
[18:03:27] <yopp> if you'll switch to the boolean value ("real": true | false) you can reduce it in around half
[18:04:07] <yopp> maybe even a bit more, depends on the distribution
[18:04:24] <StephenLynx> yopp that doesn't seem financial software.
[18:04:28] <repxxl> yopp alright i will do it thanks
[18:04:36] <StephenLynx> I am talking about stuff used for stock trading
[18:04:39] <yopp> repxxl, test it beforehand
[18:05:36] <StephenLynx> or banks
[18:06:21] <repxxl> also i asking myself when people have 2 collections like collection users with user_id field and a users_posts collection with also user_id field basically they share same id references and i have to create 2 indexes for that wich is 2x the size sounds to me also like a nonsense.
[18:06:48] <repxxl> cant there be just created one index and use that for different collections across
[18:07:09] <StephenLynx> no
[18:07:11] <yopp> StephenLynx, banks are slow, because there not much happening on the regular user account. Ad networks are bloody fast, you have hundreds of 1/1000 ¢ transactions per second
[18:07:20] <StephenLynx> and you don't have to index these relations
[18:07:29] <StephenLynx> the point is:
[18:07:38] <StephenLynx> did you cared exactly WHEN someone clicked an ad?
[18:07:51] <StephenLynx> if it was clicked at 2145ms or 2146ms?
[18:07:59] <yopp> Yeah, a lot
[18:08:04] <StephenLynx> bullshit.
[18:08:27] <repxxl> StephenLynx and for embedded documents and document enlargement i'm not a fan of this because the 16m limit which makes me one day take headache.
[18:08:28] <StephenLynx> that takes in account latency
[18:08:45] <StephenLynx> and latency can be over a whole second.
[18:09:01] <StephenLynx> why would you even care about that amount of precision on ads to begin with?
[18:09:26] <repxxl> StephenLynx "dont have to index relations ?" what u mean i hope not single document enlargement till one day 16m limit
[18:09:49] <StephenLynx> I mean that for example
[18:10:11] <StephenLynx> you have a collection of posts, each post has a field with the id of the user that posted it.
[18:10:20] <StephenLynx> you don't have to index this field.
[18:10:24] <StephenLynx> unless you really want to.
[18:13:27] <yopp> StephenLynx, you just can't imagine the all the corner cases for that shit.
[18:15:07] <yopp> Simple example: you need to stop the ad company immediately when it run out of the budget. You need to track the exact use who was the last paid one. This is small deal for traditional CPV companies, but a huge for CPA
[18:16:56] <StephenLynx> how much can, lets say, a dozen clicks, make a difference?
[18:17:08] <StephenLynx> how much money is that?
[18:17:14] <cheeser> or better, a dozen ms
[18:18:25] <yopp> StephenLynx, in CPV, almost nothing, 1/1000th of cent. In CPA companies, it might be hundreds of dollars. Depending on the budget and how they are paid (fixed payment or revenue share)
[18:18:54] <StephenLynx> hold on, hold on
[18:19:04] <StephenLynx> a dozen clicks, hundreds of dollars?
[18:19:17] <StephenLynx> how many clicks go through a day, usually?
[18:20:32] <yopp> Let me take a leak, and then I'll explain what the difference between the CPV/CPC and CPA
[18:30:36] <yopp> Okay. I'm back
[18:30:45] <yopp> Ad business is a four way shit: advertisers (pay money), ad publishers (get some money), platform (gets most monies), ad publishers audience (consumes ads)
[18:31:51] <skokkk> by audience you mean victims.. but interesting, listening, would like to know the difference between CPV/CPC and CPA.
[18:32:02] <StephenLynx> yes
[18:32:17] <yopp> skokkk, yeah, victims sold for a penny.
[18:32:25] <yopp> in fact, for 1/1000s of the penny ;)
[18:32:35] <StephenLynx> well, if you don't block ads, you have it coming
[18:33:04] <cheeser> adblockers++
[18:33:27] <yopp> But anyway, at the beginning it was simple: publisher got audience, that he can sell. For example 1M of unique views per month.
[18:34:29] <skokkk> StephenLynx, adblock ofc, but sites that force ads and remove their purpose if they detect ads.. then I am a victim.
[18:34:42] <StephenLynx> then I stop visiting the site.
[18:34:44] <cheeser> i just go elsewhere.
[18:35:27] <StephenLynx> or even better: disable javascript on the site.
[18:35:41] <StephenLynx> now it can't detect the ad was blocked and block the content.
[18:36:07] <repxxl> StephenLynx why i dont have to index this field ? i mean with index i will get faster the posts of the user if there are also many other user posts or i misundersood something on indexes
[18:36:21] <StephenLynx> as I said
[18:36:24] <StephenLynx> if you want, you can.
[18:36:27] <StephenLynx> but is not vital.
[18:36:38] <StephenLynx> everything will work without this index.
[18:36:42] <yopp> Advertiser paid for a fact that ads is shown to this audience. It was not important what will audience do with the ad that this time. It's basically Cost-per-view (CPV), but it's usually called Cost per mil (CPM). Because nobody buys a one view. It's sold in thousands
[18:38:08] <repxxl> StephenLynx yea but is it not a waste ? i mean basically that indexes already exist with the same values in user collection now i have to create the same indexes in posts collection
[18:38:32] <skokkk> yopp, you seem to be very knowledgable in this. How would a platform (such as google ads) prevent bots clicking it 1000x+ on proxies etc..
[18:38:56] <StephenLynx> I am still waiting on his explanation on how a dozen clicks costs hundreds of dollars.
[18:39:23] <yopp> At this point, you usually don't care about precision, because cost of the view is just a fraction of the cent. Basically, when publisher shows a 1M of ads, he will get, let's say $1k. So no point in precise tracking, because potential losses are cents.
[18:39:55] <skokkk> I'm starting to see what he means. You can sell $200 software/hardware and if you have one click out of 12 you will already make a profit.
[18:40:37] <yopp> Then, it became clear, that it just not working. You can spend a fortune, but at the end you will not get any profits from the ads. For example: nobody is buying in your store.
[18:40:40] <StephenLynx> I can't see it. if you expect 12 clicks on a matter of some ms, you expect a shitload over the course of the contract
[18:41:05] <StephenLynx> these hundreds of dollars are still a drop on an ocean of millions of dollards
[18:41:24] <yopp> Then ad business came up with a Cost-per-click. So nobody cares about views anymore. Publisher is get paid when someone clicks on yhe ad.
[18:41:27] <yopp> •the
[18:42:08] <yopp> CPC is more expensive, so you not getting fractions of cents, you are getting cents!
[18:43:02] <StephenLynx> yes, and actions that triggers payment are less common.
[18:43:13] <yopp> At this point, precise tracking become an small issue. Because you are showing the ads on 1000s of sites at the same time. And it's kinda important to track who will get the last cents.
[18:44:25] <yopp> Because when ad company almost ran out of money, it still will be the page that is already loaded by the user.
[18:46:26] <yopp> It worked for a while, but then internet become a huge thing, and it was like everybody are clicking on the ads (mostly accidentally). And after a while it became clear that CPC is not working as well.
[18:46:48] <yopp> So ad business came up with a new model: Cost per action.
[18:47:57] <yopp> From the ad publisher point of view it's kinda simple: you are getting dollars for the action that user is making somewhere else.
[18:48:03] <yopp> Like shitload of money
[18:48:10] <yopp> (comparing to the cents)
[18:51:30] <yopp> So, you are still showing the same ad to the millions of the people around the world (targeting still sucks), and users are still clicking. And there a simple situation: ad got like last 10$ on the budget. 1M of users are seeing the same ad at the same time, and like 1k of them are clicking at the same moment.
[18:52:56] <yopp> So you need to keep a precise track of the order of clicks. to decide who will get a 10$.
[18:53:27] <StephenLynx> doesn't the clicks go trough a server that redirects the user to the real URL?
[18:53:39] <StephenLynx> and this server that redirects validate that?
[18:54:15] <yopp> Um, it's not like one server. It's like 10/100s of servers.
[18:54:18] <StephenLynx> and if we are talking about a deployment so large that we might get 1M of users seeing an ad at the same time
[18:54:28] <StephenLynx> and each click can cost 10 dollars
[18:54:42] <StephenLynx> how large is the budget of the person publishing the ad in the first place?
[18:54:44] <yopp> Not click itself, the action that used did afterwards
[18:55:32] <StephenLynx> so if this 1k users do this action on the advertised site, it was 10k in ads that was paid.
[18:57:37] <StephenLynx> and we are talking about an interval of some ms
[18:57:51] <StephenLynx> so if we expect about 10k in 100ms
[18:58:02] <StephenLynx> we expect 100k per second
[18:58:10] <StephenLynx> over half million per minute
[18:58:30] <yopp> Uh. Nope. There a _large_ time gap between click and action
[18:58:41] <yopp> Minutes, hours, even days.
[18:58:55] <StephenLynx> ok, but I am talking about the frequency the actions happen.
[18:59:18] <StephenLynx> it doesn't matter if they cook between click and action for as long as they do if they complete every X ms.
[19:01:17] <yopp> skokkk, got the point right, btw
[19:01:48] <yopp> StephenLynx, it's not about how often they do the action, it's a silly way of tracking who will be paid: first one.
[19:01:53] <skokkk> Yopp, yes, and thank you very much for the detailed explanation.
[19:02:45] <StephenLynx> hm
[19:03:04] <yopp> StephenLynx, yeah, this is fucked up world. That's why I'm done with this shit :B
[19:03:45] <StephenLynx> actually no, it makes sense to me.
[19:04:15] <StephenLynx> I can see why its stressing to deal with petty people that makes money in a petty way.
[19:05:26] <StephenLynx> I was in contact with someone that made money out of ad revenue once.
[19:05:36] <StephenLynx> a client of the place I used to work.
[19:06:05] <StephenLynx> it was not that bad, but even him saw how it was silly to trust people to see ads when he used an ad blocker himself.
[19:06:36] <StephenLynx> and tbh, he even put with some crap from me :v
[19:13:17] <yopp> skokkk, regarding bots. Behaviour analysis.
[19:13:35] <skokkk> such as previous cookies & browsing history?
[19:14:28] <yopp> skokkk, yeah. And the fact that nobody can click or view ads on the 1000/s rate :B
[19:15:31] <skokkk> yopp, now I'm starting to feel bad for always pressing the ad (promoted result thingy) for teamviewer instead of the teamviewer (first result) in google XD
[19:16:07] <yopp> skokkk, don't be :)
[19:17:43] <yopp> Mostly, all fraud protections systems are build on the outliers detection. When something doesn't fit the "baseline" profile. I'm not expert in that field, my job is to count the monies :)
[19:18:58] <yopp> Was. Was my job. :D Right now we're working on the something opposite to that.
[19:19:19] <skokkk> yopp, opposite?
[19:19:24] <skokkk> paying for ads now? xD
[19:19:31] <StephenLynx> kek
[19:20:27] <yopp> skokkk, nope :B
[19:21:07] <yopp> can't tell much, but I hope we'll have something to share in coming weeks
[19:21:36] <yopp> (btw, paying for an ads is pretty much the groupon was about)
[20:11:02] <repxxl> is it a good idea to do one collection and have inside something like field Doc_Type : Account , Doc_Type : Post , Doc_Type : Follow so i will have one big collection i could easy query everything without the needs to combine referenced information on the application side
[20:11:23] <repxxl> i mean one big collection with everything
[20:12:01] <cheeser> in general, that's the idea of a document database.
[20:14:08] <repxxl> cheeser i will generally stick with my schemas because i can't allow that document grows behind 16mb, i will just put them in one collection instead of multiple and separate them by this doc_type for example so i can query them better and more efficiently ...
[20:55:34] <d-snp> hi
[20:56:51] <d-snp> our cluster just stopped accepting writes, we get this error on one of our collections:
[20:56:54] <d-snp> errmsg: \"requested shard version differs from config shard version for app.requests, requested version is 0|0||00000000000000...\", $gleStats: { lastOpTime: Timestamp 0|0, electionId: ObjectId('564b4199e8aecf48c7ea7db0') } }", "code" : 10429, "shard" : "shard5"
[20:56:59] <d-snp> }
[20:57:06] <d-snp> anyone know what that might be?
[21:05:19] <bz-> if i want to allow people (an app admin) to update a document arbitrarily, such that the values may change to what ever, so that it can'tt be queried for consistently to identify for the update, should i be using the object id to identify the specific document within the app .. at all times?
[21:05:42] <bz-> hopefully that makes sense
[21:20:37] <repxxl> can i somehow control the position of updating a field that does not exists like in the middle of already existing fields ?
[22:17:17] <GitGud> hey. I wanted to know the standard way to do a particular thing. i will be having a database of posts made by a bunch of users that need to be sorted at all times chronologically. and there will be a query from the front page of my webpage that lists the 4 most recent user posts. now my question is what is the most efficient way to sort them and index them and preserve the sorting in the index. and then making sure the query for the first 4 posts
[22:17:17] <GitGud> uses the date sorted index and returns the first 4 earliest post objects?
[22:21:34] <deathanchor> grrr... is newer mongo versions handle hinting counts yet?
[22:22:54] <cheeser> newer than?
[22:24:07] <deathanchor> well back in 2.6 it wasn't possible.
[22:24:49] <cheeser> was there a bug filed against it?
[22:24:57] <deathanchor> I think so
[22:25:08] <cheeser> that'd be the first thing to check then
[22:25:57] <deathanchor> hmm... so the answer is yes it works now: https://jira.mongodb.org/browse/SERVER-2677
[22:26:16] <cheeser> w00t
[22:26:31] <cheeser> though it looks like that should've worked in 2.6
[22:26:44] <deathanchor> yeah, now I can push to move to that
[22:26:54] <cheeser> are you on 2.4 then?
[22:27:47] <deathanchor> devil's land... 2.2
[22:28:33] <cheeser> wow
[23:24:35] <fxmulder> anyone know why this query is taking a long time? http://nsab.us/public/mongodb
[23:24:51] <fxmulder> there shouldn't be that many unique isp domains
[23:26:24] <Boomtime> fxmulder: try increasing loglevel to 1 or 2 on the server and observing what the server says about that command
[23:32:52] <fxmulder> hmm I set it to 2, not seeing anything related in the logs though
[23:48:25] <fxmulder> if I set it to 3 I can see that it is doing the query but beyond that is just noise