pmxbot IRC Log Viewer

[02:31:31] <Mia> Hello there channel

[02:31:47] <Boomtime> hi

[02:31:49] <Mia> I need to get nth document in my collection, what's the most efficient way to do it

[02:32:02] <Mia> .skip seems really, really slow with huge number of docs

[02:32:32] <Boomtime> yep, that's because .skip() is basically the same as enumerating the documents you asked for, just on the server instead of at the client

[02:32:49] <Mia> I realized Boomtime - so what else can I do

[02:32:52] <Boomtime> you are far better off coming up with a query to get what you actually want

[02:33:25] <Mia> well tis is for an imageboard thing, all of my images have domain.com/id urls, so domain.com/1 is the first image

[02:33:35] <Mia> so I just want to do domain.com/n to get nth image

[02:33:47] <Boomtime> why not just ask for that then?

[02:34:02] <Mia> what do you mean

[02:34:16] <Mia> That's actually whatI asked

[02:34:23] <Mia> (I believe)

[02:35:01] <Boomtime> let's say you want the 9th image...

[02:35:06] <Mia> yeah

[02:35:09] <Boomtime> thus, "domain.com/9" is the url right?

[02:35:13] <Mia> yes exactly

[02:35:23] <Boomtime> so why not query for the url that matches "domain.com/9" ?

[02:35:35] <Mia> Ah no, no, let me clarify

[02:36:13] <Boomtime> how about you pastebin (or equivalent) your current query, and an example document

[02:36:16] <Mia> the image board is mine, I have images stored in my mongodb (I mean url's of images, not the actual images) and I want to render the page with the corresponding urls when a domain.com/id page is visited

[02:36:19] <Boomtime> and what you want

[02:36:26] <Mia> so lets say domain.com/9 is visited, then I should serve 9th image in my db

[02:36:51] <Mia> my current query is just find and skip basically, which is slow

[02:37:08] <Mia> I just do find().skip(n).limit(1)

[02:37:10] <Boomtime> so, if someboyd was to insert an image ahead of the 9th (say, at 8th position) wouldn't all the later images now be wrong?

[02:37:21] <Mia> no it won't happen

[02:37:34] <Mia> the page self generates, no user interaction

[02:37:41] <Mia> it just adds images over time

[02:37:47] <Boomtime> you are relying on a specific order of output that you have not specificed

[02:37:56] <Mia> yes

[02:38:08] <Boomtime> what compulsion has the database got to preserve an order that you haven't specified?

[02:38:17] <Boomtime> answer: none

[02:38:26] <Mia> Hm - what shouşd I do

[02:38:29] <Boomtime> you need to specify what it is you want

[02:38:36] <Mia> I thought the order of addition would remain the same, always

[02:38:41] <Boomtime> i would suggest you add a field to your docunment that is unambiguous

[02:38:42] <Mia> or let's say, the order of _id's

[02:39:05] <Boomtime> then why not use the _id as the image identity? that way you can make an extremely efficient query

[02:39:10] <Mia> I can do my query as .find({},{"_id":1}).skip(n).limit(1) as well

[02:39:25] <Mia> Boomtime, because visitors want to see the first image, for instance

[02:39:33] <Boomtime> that still doesn't express your intent

[02:39:51] <Mia> I mean there is no way for others to check earlier images if I just use the document id

[02:40:08] <Mia> but now, people can jsut go and check domain.com/995 for instance, to see that image

[02:40:15] <Mia> and they can hunt for cool images by visiting

[02:40:37] <Boomtime> why can you not visit earlier images when using an _id?

[02:40:47] <Mia> Not sure if this makes sense bt I really need them to be in order because of the social behavior :]

[02:40:56] <Mia> because there is no way they can guess an _id

[02:41:13] <Mia> and there is no index file to list all images, that's the whole point

[02:41:24] <Boomtime> yeah, you need them in *order*, but you're still not expressing that order

[02:41:33] <Mia> Hm.

[02:41:43] <Mia> the order is _id order

[02:41:47] <Boomtime> the _id can give you an order, which is all that you need

[02:41:53] <Mia> so nth image in _id order is only via .skip I believe

[02:41:54] <Mia> which is slow

[02:42:12] <Boomtime> do not use .skip

[02:42:19] <Mia> what else can I do then

[02:42:22] <Boomtime> you are going to need to learn some new tools

[02:42:26] <Mia> that's why I2m here

[02:42:31] <Boomtime> like, what a query actually is

[02:42:37] <Mia> I'm open to suggestions really, completely

[02:42:37] <Boomtime> you have not yet used a query at all

[02:43:00] <Mia> So in my case what would you suggest? because I'm trying to solve this for the last 6 days

[02:43:01] <Mia> :(

[02:43:07] <cheeser> skip is fine so long as the number is small but the server will iterate over the first n-1 docs to get to that nth doc

[02:43:13] <Boomtime> ok, first: just accept this part: let's use the _id

[02:43:26] <Mia> for adressing?

[02:43:37] <Mia> domain.com/documentid you mean?

[02:43:43] <Boomtime> yes

[02:43:46] <Mia> it's not possible

[02:43:56] <Mia> if it was, I would just do that

[02:44:00] <Boomtime> then you can't be helped, because it absolutely possible

[02:44:06] <Mia> it would break the whole social convention of the website, and the whole reason

[02:44:18] <Boomtime> no, it really really wouldn't

[02:44:27] <Mia> No, I mean, let me put it this way. I dont want domain.com/somerandomid

[02:44:35] <Mia> I want domain.com/imagenumberN

[02:44:36] <Boomtime> but you are declaring it impossible before actually stopping to let me continuer

[02:44:59] <Boomtime> you need to actually search for what you want

[02:45:07] <Mia> Boomtime, I did, I'm doing for the last few days

[02:45:18] <Mia> the solution you are suggesting is changing all of the links online, everywhere

[02:45:22] <Boomtime> you keep stating these fantastic parameters for using search and then declaring then not usable

[02:45:22] <Mia> which is what I need to avoid

[02:45:54] <Mia> Boomtime, what is not clear about my problem

[02:46:32] <Mia> The problem is: I want to reach my db documents with an order number

[02:46:38] <Boomtime> if you cannot change the URLs that exists already , then change the documents that are stored, to have the parameter you need to search on

[02:46:40] <Mia> but I can't do it eficiently

[02:46:56] <Mia> Hm

[02:47:03] <Boomtime> then add a specific field to the document which is "index_number" or some such

[02:47:11] <Boomtime> that way you can hit it exactly and efficiently

[02:47:23] <Boomtime> btw, you can use the _id for this purpose

[02:47:24] <Mia> That makes sense - but then I will need to increment manually right?

[02:47:32] <Mia> because it's not relational it won't auto increment

[02:47:58] <Mia> Boomtime elighten me please this might be my solution

[02:48:00] <Boomtime> that is true, but it's not hard to construct

[02:48:21] <cheeser> well, relational isn't really relevant to *that*

[02:48:25] <Boomtime> the simplest (though slightly naive) solution is to use .count() as the index number

[02:48:38] <Mia> I mean I was just thinking, since there is an index, I would just be *magically* use that index/order to get nth document -- but looks like it's not that easy

[02:48:54] <cheeser> you can use findAndModify() for atomically increasing numbers if you need

[02:49:19] <Mia> cheeser, excuse my noobness - I was dealing with getting a random document from my db last week, and everyone was suggesting me to use a relational db just because of it's powers for random and auto incremental stuff

[02:49:24] <Boomtime> well, you've kind of dug yourself a hole - you've implemented something without undertstanding it's behavior

[02:50:12] <Mia> true

[02:50:14] <Mia> I'm learning

[02:50:32] <Mia> and this is a beautiful pain in the end

[02:50:35] <cheeser> neither random access not "auto incremental stuff" are aspects of a relation database.

[02:50:36] <Boomtime> every implementation of random i've seen (eg. mysql order by rand) only works on small scales

[02:51:13] <Mia> I really like mongodb because of it's ease of use with nodejs

[02:51:38] <cheeser> by random, do you mean indexed access? "get me item 17?"

[02:51:39] <Mia> I really don't plan on going for a relational db at this point - so I will probobly just pick the best solution possible for my case and go with it

[02:51:48] <Boomtime> consider that the very term "random" is the exact opposite of what a database is intended to do

[02:52:16] <Mia> cheeser, now I don2tneed it any more, but basically what I needed was trying to get a random document from the whole db

[02:52:38] <cheeser> fwiw, 3.2 will get you that: https://jira.mongodb.org/browse/SERVER-533

[02:52:41] <Mia> I did it by adding a random float to each of my documents and doing a greater than query with 1 limit

[02:53:06] <Mia> oh that's nice to know!

[02:53:29] <Mia> so cheeser now what I need is to get document by index

[02:53:43] <Mia> and Boomtime was guiding me about it

[02:54:19] <Mia> since find().skip(n).limit(1) is too slow for huge number of docs - I need an alternate method

[02:54:35] <Boomtime> "Mia: I did it by adding a random float to each of my documents and doing a greater than query with 1 limit" <- this is a horendously biased method

[02:55:03] <Mia> I know, but I don't need accuracy for that one

[02:55:07] <cheeser> by position you mean?

[02:55:08] <Boomtime> if you actually test the distribution you get from that method you'll find that some documents get selected far more often than others - often by orders of magnitude

[02:55:30] <Mia> and the other method I tried was, getting the total number of documents, and using the magical inefficient .skip() :/ Boomtime

[02:55:33] <Boomtime> and some documents may not every be selected (ever, ever, might actually be impossible)

[02:55:52] <Mia> Boomtime, yes I know, if docs arereturned I just run it once again

[02:56:07] <Mia> I know it's not a good solution it's just what I could find

[02:56:15] <Mia> cheeser, yes, by position

[02:56:35] <Mia> if I can learn how this is done efficiently I will correct my stupid random method as well

[02:57:11] <cheeser> get the first and last on whatever range you want. subtract their time components. multiply that by some random number between 0 and 1. add *that* to you min key's timestamp. search for any ID > than that. limit(1)

[02:57:15] <cheeser> *boom*

[02:57:44] <Mia> Oh nice

[02:58:35] <Mia> how about getting a document from a given order/index

[02:58:51] <cheeser> what?

[02:59:20] <Boomtime> that method is not random either - it will also have horrendous bias that reflects your insert load periods

[02:59:51] <cheeser> until $random lands, it's close enough for government work

[02:59:59] <Mia> cheeser, what I mean is -- when I do find({},{"_id":1}).skip(n).limit(1) -- I can get nth item

[03:00:06] <Mia> so without using skip how can I do it

[03:00:16] <cheeser> like i just told you.

[03:00:18] <Mia> I mean I need "nth item" in my collection

[03:00:20] <Boomtime> do you mean $sample? because guess how it's implemented...

[03:00:33] <Mia> n is not random --- specific nth item

[03:00:44] <Mia> cheeser, random question was another one

[03:00:44] <Boomtime> i think my point is that it is specifically NOT god enough for government work

[03:01:12] <Mia> so let's say, I want to get 1500th document from my collection

[03:01:15] <Boomtime> if you want a sample of documents where you don't care if they are truly random, then that method is fine

[03:01:18] <cheeser> Boomtime: yeah. $sample, rather.

[03:01:35] <Boomtime> and i'm quite certain that is why it's called $sample, because that is what it does

[03:01:53] <cheeser> reservoir sampling i believe is the term they're using.

[03:02:10] <Boomtime> right, it's biased from what i've seen so far, so $sample is a good name

[03:02:32] <cheeser> we're already starting to use it internally

[03:04:50] <Mia> so what do I do with nth item problematic

[03:05:32] <Mia> I really don't want to annoy you with my questions or anything but I'm looking for an answer for almost a week now

[03:05:35] <cheeser> you could fetch *just* the IDs, limit(n), get(ID)

[03:06:01] <Mia> yes but id is not an order

[03:06:04] <Mia> it's just in order right

[03:06:34] <Mia> I mean what I want to do is to provide a specific number and get the document at that index

[03:06:40] <cheeser> well, /n/ comes from somewhere. i don't think i know enough of your data model and needs to really bang together a solution as such

[03:06:41] <Mia> so 5 would get me 5th document

[03:06:51] <cheeser> just randomly toss ideas out there and see what clicks.

[03:07:39] <Mia> what do you mean

[03:08:01] <cheeser> i mean, i can only suggest ideas. you'll have to use/shape them according to your system and needs

[03:08:21] <Mia> Yes but I'm stuck

[03:08:32] <Mia> and my questions seems simple

[03:08:56] <Mia> I just need to get document from a specific index that's all :(

[03:10:53] <cheeser> and that index comes from where?

[04:24:04] <crazydip> BSON null = 0 bytes? so { a: null } takes as much space as theoretical { a: }?

[04:28:48] <Boomtime> crazydip: i suppose you can say it that way - http://bsonspec.org/spec.html

[04:29:08] <Boomtime> the null type code has no value parameter

[04:29:21] <crazydip> Boomtime: yeah i read that, was not exactly sure though

[04:29:29] <crazydip> Boomtime: thanks, makes sense :)

[04:29:55] <Boomtime> are you implementing your own bson lib, or need this for some other purpose?

[04:30:25] <Boomtime> there is a C bsonlib which takes care of the encoding/decoding, but gives you direct access to the bson if you want that

[04:30:46] <crazydip> Boomtime: no, using pymongo

[04:31:32] <Boomtime> ok, that doesn't really say what you're doing with it though :p

[04:32:01] <defk0n> can anyone help with this weird behaviour im getting with $and operator. http://pastebin.com/4f6trdY2, it seems like mongodb tries to fetch each field independently

[04:32:57] <defk0n> its so weird

[04:33:03] <defk0n> my brain hurts thinking about it

[04:33:29] <crazydip> Boomtime: i was just wondering literally about how much space it takes up :) question just poped into my head and i was curious

[04:36:20] <defk0n> there is a error in what i said gets returned from mongo, updated version. http://pastebin.com/ns8mFdSH

[04:39:03] <crazydip> defk0n: i'm a total mongodb newb, but it looks like syntax "error"

[04:39:13] <crazydip> defk0n: http://docs.mongodb.org/manual/reference/operator/query/and/#op._S_and

[04:41:36] <crazydip> defk0n: try this (disregard the last line): http://pastebin.com/hmtPa4XL

[04:42:36] <crazydip> defk0n: fixed?

[04:43:26] <defk0n> crazydip: no, still gives the same return

[04:43:35] <defk0n> crazydip: :(

[04:45:19] <defk0n> thats even weirder

[04:45:45] <defk0n> how it parses it returns identically against my wrong query and your correct one

[04:47:22] <Boomtime> defk0n: what you are seeing is correct behavior, both of your predicates can be satisfied

[04:47:34] <Boomtime> what you want however is probably this: http://docs.mongodb.org/manual/reference/operator/query/elemMatch/

[04:51:19] <defk0n> boomtime thanks, that did the job

[04:52:48] <defk0n> how do i find out in explain() if there is a full table scan ?

[07:43:51] <kenalex> hi guys

[07:44:42] <kenalex> is there any mongodb use cases outside CMS, online gaming stats data store and social networking ?

[07:47:06] <jamiel> kenalex: To begin with, what would make you think those are the only three use cases out of every possible piece of software which requires a database?

[08:03:10] <kenalex> jamiel: those ae what I came across so far

[08:03:52] <kenalex> I am trying to find other use cases to understand how mongodb is use to solve those problems and challenges encountered when using it

[08:15:09] <dddh> kenalex: you need big data sets to play with?

[08:18:37] <kenalex> no. why you ask ?

[08:19:23] <kenalex> is mongodb recommended only for big datasets ?

[08:20:15] <dddh> no

[08:20:22] <dddh> mongodb is not recommended

[08:20:26] <dddh> at all

[08:23:30] <kenalex> what type of applications is mongodb used in other than thte ones i mentioned earlier ?

[08:30:45] <amz3> kenalex: analytics for instance

[08:30:58] <amz3> I know mongodb because of analytics actually

[08:31:27] <amz3> not only for gaming stats, maybe kind of analytics

[08:32:52] <amz3> I have a question regarding mongodb implementation. Ho do you implement index using wireditger, do you use particular table for each column

[08:51:14] <alexi5> Is it normal to have mongodb deployments with only one node ?

[08:52:29] <joannac> sure

[08:53:30] <joannac> it's not very failsafe, but people do it

[08:56:34] <alexi5> Ok

[09:00:22] <jamiel> Morning all, having an issue where we have added some new shards, and can see that they are balancing and receiving chunks but from our app servers we are seeing the following error: multiple errors for op : write results unavailable from 192.168.3.16:27170 :: caused by :: Location28563 cannot send batch write operation to server 192.168.3.16:27170

[09:00:23] <jamiel> (192.168.3.16) :: and :: write results unavailable from 192.168.3.15:27172 :: caused by :: Location28563 cannot send batch write operation to server 192.168.3.15:27172 (192.168.3.15)

[09:00:38] <jamiel> have confirmed that all nodes can talk to each other

[09:03:07] <jamiel> This happens when performing an update on any of the sharded collections

[09:22:11] <amz3> seems like mongodb doesn't use the index facility of wiredtiger

[09:22:17] <alexi5> Do you guys think a polling application is a good use case for mongodb ?

[09:22:32] <amz3> alexi5: yes, why not?

[09:23:29] <alexi5> Ok cool

[09:27:38] <CatMartyn> Hello! Is anyone knows a new address of documentation for Mongoid 4.0? Old link goes to docs for 5.0 version.

[09:44:03] <jamiel> Looks like the records are updating, though, but that error is still returned

[11:36:24] <kenalex> this video real opened up my eyes about mongodb : https://www.youtube.com/watch?v=b1BZ9YFsd2o

[17:26:28] <dddh> kenalex: do you use it since you viewd it?

[17:35:18] <dddh> kenalex: did you use rdbms before or just started using mongo as a first database?

[17:57:26] <Axy> How can I add an auto incremental field to my mongo db

[17:57:33] <Axy> I mean to my collection

[17:57:56] <Axy> it's an existing collection I just wnat to give an incremental "custom_id" to all of the previous items

[17:58:05] <Axy> and from now on I would like to add things incrementally

[18:50:37] <Axy> cheeser, I've been looking for the solution to my problem I've explained yesterday - I've made a topic for it maybe you can give me some insight http://stackoverflow.com/questions/31897103/get-nth-item-from-a-collection

[21:00:55] <carver404> hi, i'm working with minimongo using browserify. However, it seems to be adding only a max of 50 documents per collection! why is

[21:01:07] <carver404> this happening? afaik 50 is really small as compared to usual max no. of docs in a collection. any headers?

[21:04:15] <Axy> I've followed this tutorial to create an auto incrementing field. http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/ It works when I'm adding new items, but how can I use it to update a collection I already have?

[21:04:20] <Axy> I use the "counters" method in the link

[22:05:32] <defk0n> someone told me that only the first operator inside a aggregator query is subject to indexes, but i need to $unwind a array so i can group on the array fields inclusively

[22:05:46] <defk0n> how do i go about that then?

[22:06:57] <defk0n> or do i need to add a index on the array itself and the array fields respectively, so unwind wont use a BasicCursor (without indexes)

[22:07:11] <defk0n> when doing $unwind as the first operator

[22:26:17] <defk0n> nobody?

Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 8th of August, 2015