[00:52:04] <dusky_> Hello, anyone can help me in a single question ?
[00:52:19] <joannac> It would help if you just asked the question
[00:53:11] <dusky_> i`m new in nosql, readed the doc but dont could make an single query
[00:53:28] <dusky_> I have a json import in a collection, like these: "itens":{ "Sticker | Banana":{ "value":1 }, "Sticker | Bash (Holo)":{ "value":2.46 }, "Sticker | Bish (Holo)":{ "value":2.15 }
[00:53:45] <dusky_> How can i get the value of a item named "Sticker | Banana"
[01:01:17] <dusky_> i think i dont import with an attribute
[01:01:44] <dusky_> for each row imported i have an atribue named "value"
[01:02:50] <dusky_> look these json, i think it is incorrect and i need to make an attribute like itemName for each item ?
[01:12:00] <dusky_> i have these api thar return a json for me, http://api.ncla.me/itemlist.php, how can i format and add the attribute itemName before each item, example: "item"Sticker | Banana":{ "value":1 },
[01:16:50] <fxmulder_r_u> so I have an index of which isp_domain is the first key, assuming there are less than 50 unique domains, $db->run_command( { distinct => "table", key => "isp_domain" } ); should return pretty much right away no?
[01:32:13] <fxmulder> it obviously isn't using the index and I'm not sure why
[04:29:51] <godzirra> Can anyone help me figure out what's going on? I'm trying to use "mongo --eval" and I'm having a lot of trouble. https://gist.github.com/slooker/85199345eab6d7419790
[04:43:06] <Boomtime> godzirra: bash weirdness, i am not sure what is going on, but it isn't the mongo shell
[04:43:15] <godzirra> Yeah, I just can't figure out what the deal is.
[05:29:22] <crk> is it viable to implement cookie based retention of my preferred mongodb version for the documentation pages?
[05:29:33] <crk> it's irksome to have to select 2.6 each time :(
[05:30:09] <crk> if this sounds like something that could be done, please direct me to where I must request it officially (bug tracker, request tracker etc.)
[05:32:03] <joannac> crk: just keep going to https://docs.mongodb.org/v2.6/ ?
[05:32:20] <joannac> rather than just https://docs.mongodb.org/manual/ (which goes to the current version)
[05:32:30] <crk> joannac, more often than not I find myself arriving at mongodb documentation through links from other pages
[05:32:38] <crk> I do make sure I save my link URLs with the full path (including version)
[05:33:09] <crk> so whenever I get there, I've to set the desired version first, before reading on. forgetting to do so that has led me to trouble a few times :P
[05:33:36] <crk> would it not be smarter if the documentation remember what version I was browsing last, and defaulted to it?
[05:33:56] <joannac> I can see both sides of this. I think it's fraught with danger if I can link you to something and not know if you actually see the same thing I do
[05:34:25] <crk> joannac: that's an angle I did not consider... hmm.
[05:34:48] <joannac> personally, it would drive me nuts. But I am not the typical user :)
[05:47:19] <joannac> yeah, for sure. having to switch versions all the time because one person is on 2.6 so needs 2.6 docs, and another is on 3.0 so needs 3.0 docs
[13:04:35] <Mattias> Rafibd01717: You don't have to create a schema beforehand, you can just insert data however you want. And it handles JSON natively! So you can search it without problems. Try follow a mongodb tutorial or something.
[13:05:26] <Rafibd01717> Mattias: so schema is the one benefit
[13:35:29] <StephenLynx> no, the lack of schema validation is not the only difference that can be beneficial. documents having sub-documents is very useful too.
[13:35:48] <StephenLynx> the ability to add more servers to a cluster without having to turn the db off
[13:35:55] <StephenLynx> and the ease you can do so.
[13:36:11] <StephenLynx> also, mongo has better performance when dealing with larger datasets.
[13:57:15] <dddh> mtools can install several instances only on the same server?
[15:51:03] <repxxl> how can i sort multiple mongo cursors by date? i mean i have such a database schema that i cant get all with one query i use multiple collections
[15:55:12] <StephenLynx> since you can't operate with multiple collections at once, it makes no difference on your application how many cursors you are working with.
[15:55:24] <repxxl> StephenLynx yea but i need them be combined and sorted .. u know
[16:09:14] <repxxl> Cheeser also not on the application side what is even mongodate for a format ?
[16:09:27] <cheeser> though you can normalize them if you want. but the date object will still track those values even if you 0 them out
[16:10:30] <repxxl> cheeser u know because when i sort by _id i getting the correct order but when i sort by MongoDate i just dont get the right order because it is too precise
[16:14:29] <repxxl> cheeser also why is _id index not automatically by default "unique" indexed
[16:17:21] <repxxl> or is it because im on older version of mongo idk ...
[16:18:00] <cheeser> it just doesn't show for whatever reason. but it's unique.
[16:18:42] <repxxl> cheeser ok so when i will set now my own unique ids to the _id field i dont need to createIndex with unique true right ? its automatically
[16:18:52] <repxxl> cheeser becasue i wanna change the _id to my custom unique ids
[16:19:10] <cheeser> you can't change existing _id values, fwiw
[16:19:55] <repxxl> cheeser oh i tought i can change the _id so its fixed i cant do anything with it right ?
[16:21:50] <StephenLynx> no, you will need to create a separate unique index.
[16:22:00] <StephenLynx> I do that often and completely ignore _id
[17:22:53] <StephenLynx> usually I define some human-friendly unique indexd.
[17:24:06] <StephenLynx> what are you concerned with?
[17:24:49] <repxxl> StephenLynx well, i reading this http://stackoverflow.com/questions/6645277/should-i-implement-auto-incrementing-in-mongodb speciall the answer "42" it starts with "I strongly disagree with author"
[17:25:50] <repxxl> StephenLynx so im concerned in future if my human-friendly unique indexes will not be too large the bytes and ram usage i mean. to do it efficient
[17:29:00] <StephenLynx> you are storing forum posts.
[17:29:07] <StephenLynx> and you have multiple sub-forums
[17:29:35] <StephenLynx> and posts have unique auto-incremented id by sub-forum, making it possible for two posts to have the same id, as long as they live in different forums
[17:29:35] <repxxl> StephenLynx yea i got you but wait you use _id like everyone and also a human readable string id like me but the human readable string will take too mutch RAM to get indexed you could only leave with the _id
[17:30:04] <StephenLynx> a base64 string is not human readable.
[17:30:16] <yopp> StephenLynx, incremental ids are crucial for ordering
[17:32:38] <StephenLynx> how can you get a different date?
[17:33:03] <yopp> you don't following: ordering _can't_ be guaranteed by the timestamp
[17:33:18] <repxxl> StephenLynx i have the same for users like you, for my "links" i use _id and a access_id which is something like "RHU5wt2upKMh" a 12 long base64 string which is also indexed
[17:33:46] <StephenLynx> yopp with _id I am pretty sure it is obtained by a single machine.
[17:40:30] <yopp> repxxl, generally you should not care about the id's in the urls. Nobody cares. Major drawback of the ObjectId is the size. If you have a lot of small records (it's kinda where mongo sucks a lot, even with WT), you might have storage overhead for the _id field and index of the size of the data.
[17:42:20] <repxxl> yopp and StephenLynx alright i going with the objectid i will use other ids formats really only for human readble ids like "usernames" for example ty for help :)
[17:42:21] <StephenLynx> and even then, natural order isn't useful.
[17:42:45] <StephenLynx> I can't remember a single time where I needed documents by their insertion order.
[17:51:29] <repxxl> one more question, when i have a field "type" : "real" and "type" : "unreal", now i want to find the "real" documents and i have 1milion documents there are saved only 2 indexes in the RAM since there are only 2 different values "real" and "unreal" or 1 milion indexes will be saved to RAM ?
[17:51:43] <yopp> StephenLynx, I've engineered couple. Sold soul to the devil: ad networks.
[17:52:05] <Owner_> if you are connected to a database, but using the oplog, do you only recieve updates for that database?
[17:52:21] <Owner_> like from application standpoint
[18:01:13] <yopp> repxxl, 1 million index records in this case
[18:02:15] <yopp> repxxl, basically, size of that index will be pretty much the same as _id index size
[18:03:27] <yopp> if you'll switch to the boolean value ("real": true | false) you can reduce it in around half
[18:04:07] <yopp> maybe even a bit more, depends on the distribution
[18:04:24] <StephenLynx> yopp that doesn't seem financial software.
[18:04:28] <repxxl> yopp alright i will do it thanks
[18:04:36] <StephenLynx> I am talking about stuff used for stock trading
[18:06:21] <repxxl> also i asking myself when people have 2 collections like collection users with user_id field and a users_posts collection with also user_id field basically they share same id references and i have to create 2 indexes for that wich is 2x the size sounds to me also like a nonsense.
[18:06:48] <repxxl> cant there be just created one index and use that for different collections across
[18:07:11] <yopp> StephenLynx, banks are slow, because there not much happening on the regular user account. Ad networks are bloody fast, you have hundreds of 1/1000 ¢ transactions per second
[18:07:20] <StephenLynx> and you don't have to index these relations
[18:08:27] <repxxl> StephenLynx and for embedded documents and document enlargement i'm not a fan of this because the 16m limit which makes me one day take headache.
[18:08:28] <StephenLynx> that takes in account latency
[18:08:45] <StephenLynx> and latency can be over a whole second.
[18:09:01] <StephenLynx> why would you even care about that amount of precision on ads to begin with?
[18:09:26] <repxxl> StephenLynx "dont have to index relations ?" what u mean i hope not single document enlargement till one day 16m limit
[18:10:11] <StephenLynx> you have a collection of posts, each post has a field with the id of the user that posted it.
[18:10:20] <StephenLynx> you don't have to index this field.
[18:10:24] <StephenLynx> unless you really want to.
[18:13:27] <yopp> StephenLynx, you just can't imagine the all the corner cases for that shit.
[18:15:07] <yopp> Simple example: you need to stop the ad company immediately when it run out of the budget. You need to track the exact use who was the last paid one. This is small deal for traditional CPV companies, but a huge for CPA
[18:16:56] <StephenLynx> how much can, lets say, a dozen clicks, make a difference?
[18:18:25] <yopp> StephenLynx, in CPV, almost nothing, 1/1000th of cent. In CPA companies, it might be hundreds of dollars. Depending on the budget and how they are paid (fixed payment or revenue share)
[18:30:45] <yopp> Ad business is a four way shit: advertisers (pay money), ad publishers (get some money), platform (gets most monies), ad publishers audience (consumes ads)
[18:31:51] <skokkk> by audience you mean victims.. but interesting, listening, would like to know the difference between CPV/CPC and CPA.
[18:35:27] <StephenLynx> or even better: disable javascript on the site.
[18:35:41] <StephenLynx> now it can't detect the ad was blocked and block the content.
[18:36:07] <repxxl> StephenLynx why i dont have to index this field ? i mean with index i will get faster the posts of the user if there are also many other user posts or i misundersood something on indexes
[18:36:38] <StephenLynx> everything will work without this index.
[18:36:42] <yopp> Advertiser paid for a fact that ads is shown to this audience. It was not important what will audience do with the ad that this time. It's basically Cost-per-view (CPV), but it's usually called Cost per mil (CPM). Because nobody buys a one view. It's sold in thousands
[18:38:08] <repxxl> StephenLynx yea but is it not a waste ? i mean basically that indexes already exist with the same values in user collection now i have to create the same indexes in posts collection
[18:38:32] <skokkk> yopp, you seem to be very knowledgable in this. How would a platform (such as google ads) prevent bots clicking it 1000x+ on proxies etc..
[18:38:56] <StephenLynx> I am still waiting on his explanation on how a dozen clicks costs hundreds of dollars.
[18:39:23] <yopp> At this point, you usually don't care about precision, because cost of the view is just a fraction of the cent. Basically, when publisher shows a 1M of ads, he will get, let's say $1k. So no point in precise tracking, because potential losses are cents.
[18:39:55] <skokkk> I'm starting to see what he means. You can sell $200 software/hardware and if you have one click out of 12 you will already make a profit.
[18:40:37] <yopp> Then, it became clear, that it just not working. You can spend a fortune, but at the end you will not get any profits from the ads. For example: nobody is buying in your store.
[18:40:40] <StephenLynx> I can't see it. if you expect 12 clicks on a matter of some ms, you expect a shitload over the course of the contract
[18:41:05] <StephenLynx> these hundreds of dollars are still a drop on an ocean of millions of dollards
[18:41:24] <yopp> Then ad business came up with a Cost-per-click. So nobody cares about views anymore. Publisher is get paid when someone clicks on yhe ad.
[18:42:08] <yopp> CPC is more expensive, so you not getting fractions of cents, you are getting cents!
[18:43:02] <StephenLynx> yes, and actions that triggers payment are less common.
[18:43:13] <yopp> At this point, precise tracking become an small issue. Because you are showing the ads on 1000s of sites at the same time. And it's kinda important to track who will get the last cents.
[18:44:25] <yopp> Because when ad company almost ran out of money, it still will be the page that is already loaded by the user.
[18:46:26] <yopp> It worked for a while, but then internet become a huge thing, and it was like everybody are clicking on the ads (mostly accidentally). And after a while it became clear that CPC is not working as well.
[18:46:48] <yopp> So ad business came up with a new model: Cost per action.
[18:47:57] <yopp> From the ad publisher point of view it's kinda simple: you are getting dollars for the action that user is making somewhere else.
[18:51:30] <yopp> So, you are still showing the same ad to the millions of the people around the world (targeting still sucks), and users are still clicking. And there a simple situation: ad got like last 10$ on the budget. 1M of users are seeing the same ad at the same time, and like 1k of them are clicking at the same moment.
[18:52:56] <yopp> So you need to keep a precise track of the order of clicks. to decide who will get a 10$.
[18:53:27] <StephenLynx> doesn't the clicks go trough a server that redirects the user to the real URL?
[18:53:39] <StephenLynx> and this server that redirects validate that?
[18:54:15] <yopp> Um, it's not like one server. It's like 10/100s of servers.
[18:54:18] <StephenLynx> and if we are talking about a deployment so large that we might get 1M of users seeing an ad at the same time
[18:54:28] <StephenLynx> and each click can cost 10 dollars
[18:54:42] <StephenLynx> how large is the budget of the person publishing the ad in the first place?
[18:54:44] <yopp> Not click itself, the action that used did afterwards
[18:55:32] <StephenLynx> so if this 1k users do this action on the advertised site, it was 10k in ads that was paid.
[18:57:37] <StephenLynx> and we are talking about an interval of some ms
[18:57:51] <StephenLynx> so if we expect about 10k in 100ms
[18:58:02] <StephenLynx> we expect 100k per second
[18:58:10] <StephenLynx> over half million per minute
[18:58:30] <yopp> Uh. Nope. There a _large_ time gap between click and action
[19:13:35] <skokkk> such as previous cookies & browsing history?
[19:14:28] <yopp> skokkk, yeah. And the fact that nobody can click or view ads on the 1000/s rate :B
[19:15:31] <skokkk> yopp, now I'm starting to feel bad for always pressing the ad (promoted result thingy) for teamviewer instead of the teamviewer (first result) in google XD
[19:17:43] <yopp> Mostly, all fraud protections systems are build on the outliers detection. When something doesn't fit the "baseline" profile. I'm not expert in that field, my job is to count the monies :)
[19:18:58] <yopp> Was. Was my job. :D Right now we're working on the something opposite to that.
[19:21:07] <yopp> can't tell much, but I hope we'll have something to share in coming weeks
[19:21:36] <yopp> (btw, paying for an ads is pretty much the groupon was about)
[20:11:02] <repxxl> is it a good idea to do one collection and have inside something like field Doc_Type : Account , Doc_Type : Post , Doc_Type : Follow so i will have one big collection i could easy query everything without the needs to combine referenced information on the application side
[20:11:23] <repxxl> i mean one big collection with everything
[20:12:01] <cheeser> in general, that's the idea of a document database.
[20:14:08] <repxxl> cheeser i will generally stick with my schemas because i can't allow that document grows behind 16mb, i will just put them in one collection instead of multiple and separate them by this doc_type for example so i can query them better and more efficiently ...
[20:57:06] <d-snp> anyone know what that might be?
[21:05:19] <bz-> if i want to allow people (an app admin) to update a document arbitrarily, such that the values may change to what ever, so that it can'tt be queried for consistently to identify for the update, should i be using the object id to identify the specific document within the app .. at all times?
[21:20:37] <repxxl> can i somehow control the position of updating a field that does not exists like in the middle of already existing fields ?
[22:17:17] <GitGud> hey. I wanted to know the standard way to do a particular thing. i will be having a database of posts made by a bunch of users that need to be sorted at all times chronologically. and there will be a query from the front page of my webpage that lists the 4 most recent user posts. now my question is what is the most efficient way to sort them and index them and preserve the sorting in the index. and then making sure the query for the first 4 posts
[22:17:17] <GitGud> uses the date sorted index and returns the first 4 earliest post objects?