PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 29th of December, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:04:53] <_roland> Suppose I have the zip code dataset (http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/) and I'd want to "flatten" the loc array.
[02:05:32] <_roland> So I'd like to project loc[0] as a new field called "longitude" and loc[1] as a new field called "latitude"
[02:06:42] <_roland> so I tried db.zips.findOne(null, {"loc.0":true}); - it gives me an empty array for the "loc.0" field. I thought this was the proper dot notation for array elements?
[02:09:14] <_roland> I also tried db.zips.aggregate([{$project: {_id:1, longitude: {$let: { vars: {lat: "$loc.0"}, in: "$$lat"}} }}]);
[02:09:26] <_roland> that gives me the same result - empty array.
[02:09:55] <bmillham> look at $unwind in the aggregation. That may do what you are looking for.
[02:10:26] <_roland> bmillham: I know unwind - it will generate a new doc for each index. but then I still don't know which array index belonged to which doc.
[02:11:01] <_roland> bmillham: I suppose that in thos particular case I could group in _id and use $first and $last becuase the array happens to always have 2 elements.
[02:11:28] <bmillham> Or maybe $push
[02:11:31] <_roland> bmillham: but I'm actually looking for a generic way to project any element of any index, not only those that have exactly 2 elements
[02:11:48] <Boomtime> http://docs.mongodb.org/manual/reference/operator/projection/slice/#proj._S_slice
[02:11:59] <bmillham> But I'm still learning all this...
[02:12:19] <_roland> bmillham: thanks, I'll check out $push.
[02:12:30] <_roland> Boomtime: thanks I'll check pout slice too.
[02:13:00] <Boomtime> note that the basic projection available on .find cannot change the type of the fields (like aggregation can), it can really only do filtering
[02:13:27] <Boomtime> so you're going to get an array back, but you can control the precise content of that array
[02:14:07] <_roland> Boomtime: mm...I'm really looking for a way to get a scalar. Or at least, a single element.
[02:14:15] <Boomtime> meanwhile, can you explain why an array with well-defined ordinals doesn't suit your purpose?
[02:14:53] <Boomtime> i mean, projection is a client-side result, you can as easily read obj.loc[0] as obj.longitude
[02:15:44] <_roland> Boomtime: yes. well the same could be said for all projection operations. Why implement them in mongo if you can do everything on the client side, right?
[02:15:59] <Boomtime> that is not true
[02:16:19] <Boomtime> projection permits filtering - you can literally stop data from being returned - or constrain it to the index for speed purposes
[02:16:50] <Boomtime> in the case of aggregation you can aggregate results that would require large volumes of data on the client side
[02:17:04] <Boomtime> but which ultimately provide a short summary instead
[02:18:39] <_roland> Boomtime: well, I could still simply changed my mind about how to represent my data and have the need to migrate - convert. Ideally I wouldn't need to pull anything to the client for that.
[02:19:09] <Boomtime> how will a find or aggregation avoid that?
[02:19:45] <_roland> Boomtime: it won't. your question was, why do I want to project and flatten an array in the first place. I gave a real world example.
[02:20:17] <Boomtime> your real-world example does not require projection
[02:20:41] <Boomtime> it can benefit from, at best, a simple filter
[02:21:19] <Boomtime> a more sophisticated $rename operator is what you want for that example, but it would have to have some complex logic
[02:22:23] <_roland> Boomtime: I'm afraid I don't understand. How does flattening an array of arbitrary size not require projection?
[02:22:59] <Boomtime> let us suppose you use projection to flatten this array... where does the result go?
[02:23:22] <_roland> Boomtime:It goes into the output documents of my query. Which may or may not be immediately stored into a new collection.
[02:26:33] <Boomtime> "immediately stored into a new collection"
[02:26:38] <Boomtime> via the client?
[02:26:45] <Boomtime> or are you using aggregation now?
[02:26:47] <_roland> Boomtime: rather not I hope!
[02:27:03] <_roland> Boomtime: well if that is what it takes...
[02:27:04] <Boomtime> ok, so now we're making progress.. you want to use the aggregation pipeline
[02:27:18] <Boomtime> and yes, in that context what you want makes perfect sense
[02:27:28] <Boomtime> you are changing your schema
[02:27:40] <_roland> Boomtime: yes. projecting array elements as properties.
[02:27:51] <_roland> flattening an array.
[02:28:17] <specialsauce> is it possible to create a max value index for a field? ie cause an insert to fail if you try to insert a document that has a field with a value greater than n?
[02:28:18] <Boomtime> ok, so abandon .find it is for returning results to the client
[02:28:43] <Boomtime> use .aggregate so you can pipe directly to a collection and keep the whole operation inside the server
[02:28:53] <_roland> Boomtime: fine. I gave sample of 2 things I tried, one with find one with aggregate.
[02:29:20] <_roland> "(03:07:37 AM) _roland: I also tried db.zips.aggregate([{$project: {_id:1, longitude: {$let: { vars: {lat: "$loc.0"}, in: "$$lat"}} }}]); "
[02:37:42] <_roland> Boomtime: do you know why my attempt with aggregate doesn't work as intended? Am i not using the dot notation correctly for the loc array?
[02:38:46] <Boomtime> addressing arrays like that doesn't seem to be supported in aggregation - not sure why
[02:39:37] <_roland> Boomtime: I also tried to achieve the same with $splice - that doesn't seem to be supported either.
[02:39:54] <Boomtime> $slice, but no, that is a find operator only
[02:39:57] <_roland> not in aggregate $project - that gives error "invalid operator".
[02:40:30] <_roland> Boomtime: mm. I have trouble understanding why the context makes a difference. I accept that it does, but it doesn't seem to make sense to me.
[02:45:16] <Boomtime> i am surprised to learn that dotted field notation does not work to address the elements of an array in $project in aggregation.. i have tried several different things now
[02:45:31] <_roland> Boomtime: I just tried db.zips.aggregate([{$group: {_id: "$_id", lon: {$addToSet: "$loc.0"}}}]);
[02:45:54] <_roland> Boomtime: seems you're right and it doesn't work at all. Thanks for looking into it for me.
[02:45:58] <Boomtime> i would expect this to work: { lon: "$$CURRENT.loc.0" }
[02:46:22] <Boomtime> that expression shows the use of dotted notation is operational ($$CURRENT.loc works)
[02:46:37] <Boomtime> but that the extra .0 does not dereference into the array
[02:46:48] <Boomtime> you should raise a bug/feature request
[02:46:59] <_roland> Boomtime: yes, I can see in the results of other attempts that it is handling an array, since it returns an array in the result. Just an empty array.
[02:48:00] <_roland> Boomtime: thanks again. I think it should be reported as a bug. I think the dot notatoin does work for subdocuments, so it should work for arrays as well.
[02:49:33] <_roland> Boomtime: for example, I ran this the other day
[02:49:33] <_roland> db.nutrition.aggregate([{$unwind: "$nutrients"},{$project: {units: {$let: {vars: {b: "$nutrients.units"}, in: "$$b"} } } }]);
[02:50:03] <_roland> to flatten to "nutrients" subdocument out of the way. I think my example earlier is similar to this.
[02:50:29] <_roland> (except that loc is an array at this point, and nutrients is a subdocument.
[02:50:29] <_roland> )
[02:51:56] <Boomtime> this feature request would provide one solution: https://jira.mongodb.org/browse/SERVER-4588
[02:53:21] <_roland> Boomtime: found this https://jira.mongodb.org/browse/SERVER-4589. Open unresolved.
[02:54:11] <_roland> Boomtime: ah yes. saw that the other day. However ,that is slightly different since that is about reporting the index after unwind.
[02:54:41] <_roland> Boomtime: if I know what index to extract, there is no need to unwind. If I know the Dot expression, no documents need to be created unwinding the entire array.
[02:55:07] <Boomtime> right, for your use case 4589 is better
[02:55:17] <_roland> Boomtime: anyway. Thanks for thinking and looking. I appreciate it.
[02:55:50] <_roland> Boomtime: I think I'm going to have to conclude that for the time being, aggregation framework has a couple of -to me- serious maturity issues.
[02:58:38] <_roland> Boomtime: just read that this dot syntax is supported in the $match stage. I find it hard to understand why it would be built such that it doesn't support it in the other stages as well.
[02:59:45] <Boomtime> it does seem strange
[03:00:29] <Boomtime> welp, you have a solution for the 2 indice condition, but not general ordinals
[03:02:48] <_roland> Boomtime: well there's always map reduce. Although that has, to the extent that I have been exposed to it, a bunch of issues of its own.
[03:03:09] <_roland> Boomtime: anyway - thanks again! It's good to know I don't need to look further, for the time being.
[03:04:12] <Boomtime> no worries, and yes, map-reduce is often a solution where aggregation can't do it, MR usually isn't recommended only because it is a high-cpu and potentially high-blocking operation
[03:06:18] <bmillham> A completely off topic question here. Has anyone used Twitter Bootstrap popover's, with AJAX to populate the popover?
[03:38:14] <rampsie> does a bulk execution provide any optimisations or is it just convenience? ie if I had a large array of doc ids that I wanted to remove would it be better to run .remove({_id: {$in: docIds}}... or build a bulk execution linkIds.forEach(function(id) { bulk.find({_id: id}).remove();}); bulk.execute()
[03:40:54] <Boomtime> rampsie: your first operation is probably more efficient at the primary - though interestingly, it translates to the same thing when replicated
[03:44:59] <rampsie> are you calling explain on the bulk execution somehow?
[03:46:44] <rampsie> how are you able to observe the steps mongo takes
[03:46:59] <Boomtime> by checking the source code, and knowing the product really quite well..
[03:47:49] <rampsie> ^^ fair enough
[03:47:56] <Boomtime> although, you can observe the steps via logs if you like
[03:48:08] <Boomtime> logLevel 1 will get you the ops on the primary
[03:48:35] <Boomtime> and reviewing the resulting entries in oplog.rs collection will confirm what i just said
[03:49:15] <Boomtime> i.e the remove via $in is a single op received, by gets exploded in the op-log (replication) to one op per removed doc
[03:49:29] <rampsie> ah
[03:50:55] <rampsie> I guess that would be optimal rather than checking the array content for each document candidate complexity exploding very quickly rather than a linear amount operations
[03:55:28] <rampsie> does there exist a mechanism/strategy to cap the size of a subset of a collection? ie limit the numbe rof documents in a collection with a particular field value?
[03:56:19] <rampsie> the value would be indexed and reference a document in another collection
[03:56:25] <Boomtime> nope, though you can put those documents in a capped collection of their own
[03:57:29] <rampsie> yah I had come across capped collections however it didnt seem appropriate on account of there being a potentially unbounded amount of these sub collections
[03:58:05] <bmillham> MySQL to Mongo question here. With MySQL, I can issue a SHOW TABLE STATUS to use to be able to know when a table has changed. Is there an equivalent in MongoDB?
[03:58:11] <Boomtime> you have some business rules you need to implement - they'll need to be implemented in application logic
[03:59:32] <Boomtime> bmillham: not that i know of, very little data is retained that you didn't specifically ask for - modified times would fall in to that category
[03:59:56] <Boomtime> you can get collection stats, but it does not contain a last accessed/modified/created time
[04:00:07] <rampsie> Boomtime: right and I have done so, unfortuantely the result is recursive bulk inserts which have worse case scenarios of recursing to a depth equal to the bound I wish to enforce, creating a bulk isnert object on each recurse
[04:00:36] <bmillham> I mainly just want to know if something (anything) has changed.
[04:00:45] <rampsie> I think im stuck but thought I would look around for something native/better
[04:06:42] <rampsie> is it possible to create a lock on recently inserted documents until a callback finishes? ie run an insert, lock the inserted documents check if existing documents + inserted exceed the limiting value I wish to impose and remove/trim the excess and release the lock?
[04:07:17] <Boomtime> you are describing a transaction, not supported, sorry
[04:08:54] <Boomtime> you would need to implement that, non-trivial, procedure yourself via the use of a lock/journal document - see the similar procedure article on two-phase commit: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
[04:12:00] <seiyria> Anyone here able to prod someone about this? https://jira.mongodb.org/browse/NODE-334
[04:12:10] <Aric> Is there a way to sync two mongo collections between different sites?
[04:12:33] <seiyria> you could use a replica set, Aric
[04:12:51] <Aric> would that give realtime (ish)?
[04:13:06] <seiyria> roughly
[04:13:17] <seiyria> what's your use case?
[04:13:18] <Aric> want to sync a user's db between the two sites so creating or changing a profile on either one goes to the other to make them "like" one site
[04:13:52] <seiyria> ah
[04:14:03] <Aric> i also eventually want to sync over a collection of people/bios to a second site that will use that to generate hyperlinks to the first site based on names/available etc
[04:14:03] <seiyria> so something like stackexchange, where you can create a profile on one, but sync to all the others?
[04:14:18] <Aric> ya pretty much..
[04:14:43] <seiyria> yeah, I would say a replica set would be fast enough for an operation like that
[04:15:31] <Aric> http://docs.mongodb.org/manual/core/replication-introduction/
[04:15:35] <seiyria> although I'm nopt certain a replica set is quite the right too, but you could try it out and see if it works out for you.
[04:15:37] <Aric> got it...
[04:15:48] <Aric> so even if the sites are on different servers...
[04:16:24] <Aric> and it
[04:16:29] <Aric> and it's two ways?
[04:16:39] <seiyria> if you provide both nodes write access, yes
[04:17:10] <Aric> ok, i was looking at the diagrams and description and didn't see that
[04:17:23] <Boomtime> a replica-set has only one writable node
[04:17:28] <Boomtime> the primary
[04:18:08] <Boomtime> the main idea of a replica-set is for high availability - in the event that one node fails, another may take its place
[04:18:16] <seiyria> ah, right.
[04:18:37] <seiyria> my mistake.
[04:18:43] <Aric> ok
[04:19:02] <Boomtime> Aric: the situation you describe is more complicated - you want resident independent writable data in two locations that each share a small subset of common data
[04:19:13] <Aric> i could redirect new signups to one, but that would be very difficult :/
[04:19:31] <Aric> I am really looking for a way to sync them back and forth...
[04:19:55] <seiyria> you /could/ have an authoritative node that just propagates outwards
[04:20:17] <seiyria> where all sites actually are in read-only mode of the data, but updating one sends an update to the "main" service, if you will.
[04:30:49] <rampsie> Boomtime: that is actually a decent solution simple adding a pending field to the inserted docs... if I add a check to any reads to omit pending results but include the pending docs in write checks would stop race conditions while I trimmed the excess docs
[04:32:07] <rampsie> not amazing but perhaps better than deep recursion client side
[04:36:28] <joshumax> What's up with node_mongo?
[04:36:43] <joshumax> Error: Invalid mongodb uri. Must begin with "mongodb://"
[04:36:44] <joshumax> Received: "mongodb://127.0.0.1"
[04:51:14] <bmillham> joshumax: I'm not familiar with node_mongo, but is it possible that it's putting the 'mongodb://' there for you? Have you just tried the 127.0.0.1 (or localhost)?
[04:52:09] <joshumax> bmillham: placing it directly in the configuration file fixed it...it seems to be a windows-specific issue
[04:52:38] <bmillham> :-D
[04:52:58] <bmillham> Windows issues... Never... ;-)
[05:03:11] <rampsie> is there a difference between constructing a bulk insert using db.initializeUnorderedBulkOp() and inserting an array of documentss?
[05:10:53] <Boomtime> rampsie: you get a little more control over the bulk insert op - inserting an array can fail halfway and it's up to you to figure out how far through it goty
[05:11:08] <Boomtime> inserting an array *of documents*
[05:12:51] <rampsie> if you set ordered: false it should continue inserting the rest though like the unorderedBulkOp according to the docs
[05:13:22] <Boomtime> right, you get more control
[05:13:35] <Boomtime> ah, with array insert?
[05:13:41] <Boomtime> ok, now... which ones failed?
[05:14:38] <Boomtime> also, bulkop can do lots of ops, not just insert, but delete and update at the same thing
[05:18:50] <rampsie> right ok.. theres no way of resuming an ordered bulk operation after an error has occurred and been handled by chance?
[05:24:07] <Boomtime> you started off asking about the difference between insert by array method, and bulk op method
[05:24:25] <Boomtime> ordered and unordered are different bulk ops
[05:25:10] <rampsie> I realise, I have moved on from the array question
[05:25:16] <Boomtime> ok
[05:25:28] <Boomtime> it makes no sense to continue an ordered bulk op
[05:26:14] <Boomtime> under the conditions where ordered matters, any failure must necessarily invalid future ops in the same bulk-op - otherwise it doesn't matter if they are ordered, right?
[05:27:13] <rampsie> it just occurred to me in relation to my maintaining a fixed subset of documents in a collection I could run an ordered bulk op, when my unique index is hit, I could not increment the size of the subset and proceed with the operation
[05:27:20] <rampsie> ie a synchronous insert
[05:27:31] <Boomtime> no
[05:27:58] <Boomtime> a bulk op does not assure synchronicity - the ops are basically like calling each equivalent op independently
[05:28:26] <Boomtime> if another operation from another client comes along at the same time, they will observe those operations being applied one at a time
[05:29:31] <rampsie> I dont understand how order in the bulk operation doesnt imply synchonicity
[05:29:55] <Boomtime> because another client can modify things midway
[05:30:09] <Boomtime> the database can talk with lots of clients at a time
[05:33:55] <rampsie> yah that would be problematic... I guess I was thinking syncronous client side
[05:35:49] <rampsie> oh mongo everything is great bar this one operation and it happens to be the main operation of the application.. of course
[05:38:56] <rampsie> what sounds like a more problematic worst case? 500-1000 depth recursive insert in the client or a pseudo transaction that could portentially insert 500-1000 documents then immidiately remove all but one
[05:48:03] <Boomtime> rampsie: the trouble with inserting docs, even temporarily, on the server is that they are available to all clients for the period they exist - consider that when you make your choice
[05:49:10] <Boomtime> the trouble with inserting docs one at a time from the client is that you're serializing your insertion
[05:49:26] <bmillham> rampsie: I'm curious why you would bulk insert and then remove documents?
[05:49:28] <rampsie> well yes the idea was to add a pending field to the inserted documents and update the retrieval methods of my model to omit any documents containing the pending field... thus effecting a transaction
[05:49:49] <rampsie> the callback trims the excess documetns and removes the pending field
[05:50:08] <rampsie> this is for the maintenance of the max number of documents of type A in a collection
[05:50:17] <rampsie> my subset limit problem I mentioned before
[05:50:27] <Boomtime> rampsie: right, something like that is a good method - be aware that the update to change the status from pending to 'done' (or whatever) is also synchronous and represents independent ops
[05:51:23] <Boomtime> ok, synchronous is not the right word there...
[05:51:34] <rampsie> a query that occurs during the update and therefore returns slightly less documents than should be available sounds like a reasonable compramise
[05:51:50] <rampsie> or at least doesnt present much of an issue
[05:51:53] <Boomtime> each document updated is a discrete op, despite it might look like a single op to you
[05:52:26] <Boomtime> rampsie: excellent, knowing your data helps a lot to solve this type of problem
[05:57:55] <bmillham> Boomtime: question about _id. I know it's automagically generated. But does it always increase? So if I want to see any documents added after the last known document (checked in a snapshot) would looking for _id > lastobjectid work? Or can the objectids get smaller
[05:58:21] <Boomtime> objectids "usually" increase
[05:58:46] <bmillham> That's what I figured.
[05:58:55] <Boomtime> understanding how objectid gets generated helps you reason about the content: http://docs.mongodb.org/manual/reference/object-id/
[05:58:57] <rampsie> bmillham: you could just add an 'added' field
[05:59:04] <rampsie> with a Date object
[05:59:34] <Boomtime> if you know that the time on all your clients is always up-to-date, then objectids fairly reliably monotonically increment
[05:59:41] <bmillham> I'm thinking that the overhead of checking for changes and then finding the changes is less than just simpily looking for the changes
[05:59:48] <Boomtime> objectid is generated client-side by default
[05:59:50] <bmillham> For what I'm doing
[06:00:08] <bmillham> For me, client-side will be the web server
[06:00:15] <Boomtime> alternatively, you can make the server generate the objectid for you
[06:00:23] <bmillham> Oh, mostly the web server.....
[06:00:52] <bmillham> Hmm, a little more thinking of how to 'not think the SQL way'
[06:01:38] <bmillham> rampsie: I already sorta have that, so I think that's the way I'll go.
[06:02:08] <bmillham> But I'm not using date objects. I've decided on Unix timestamps
[06:08:18] <bmillham> Completely off subject, but when I took the recycling out a bit ago, I heard a pack of Coyotes. Real close (like 1/4 mile or less). For the second time in the last several days!
[06:08:30] <bmillham> I hurried back in!
[06:15:59] <bmillham> Sorry, back on subject. Is they any real advantage to using a Date object vs just using a Unix timestamp? I don't need dates older than 1970, and I doubt that this site will be around after 2038
[06:16:46] <bmillham> And the only date calculation that I do (database side) is to look for documents added in the last 7 days.
[06:16:50] <rampsie> billham: sounds dangerous!
[06:17:24] <bmillham> rampsie: Using unix timestamps, or coyotes?
[06:17:47] <Boomtime> how do you use a coyote?
[06:18:01] <Boomtime> (sorry, i couldn't resist)
[06:18:05] <bmillham> lol
[06:18:29] <rampsie> :D coyotes.. re the timestamp I suspect they take the same amount of space in the db if thats a concern, if you want to use the timestamp client side you would have to wrap it/convert it somehow right?
[06:18:57] <rampsie> I just use the Date object because most of what I use it for is client side and the various methods are useful to what im doing
[06:19:15] <bmillham> Client side will *almost* always be the web server.
[06:19:48] <Boomtime> bmillham: a date object has the specific bson-type 'date' so every client that handles it knows to treat it as a date
[06:20:14] <Boomtime> with unix time you'll need to do that step
[06:20:35] <Boomtime> what you've described sounds like you prefer working with unix timestamps, so stick with what you know if you see no benefit
[06:20:51] <bmillham> I'll look at the Date object closer.
[06:21:48] <bmillham> But one thing I've done on the site (which is still in development) is I display times in 'human time' from now. i.e. 5 minutes ago
[06:22:21] <bmillham> And that's real easy to do with unix timestamps
[06:22:53] <Boomtime> what makes it hard with a date object?
[06:23:30] <bmillham> I guess because I already have the Python code to do it with a timestamp ;-)
[06:23:38] <Boomtime> right :p
[06:24:01] <Boomtime> so that's a 'stick with what you know' reason, which is perfectly valid, don't replace what isn't broken etc
[06:24:08] <bmillham> :)
[06:24:38] <rampsie> you can get the timestamp from the Date object of course but that seems redundant in your case
[06:24:45] <rampsie> obj.getTime()
[06:24:54] <bmillham> Or the coyotes may get me ;-) I won't even mention the bobcat I heard a few weeks ago....
[06:28:47] <bmillham> Is the Date object timezone aware? Since I'm still in the design phase, I may decide that I do want dates displayed.
[06:29:29] <bmillham> I decided on the 'from now' model, because it doesn't need to be timezone aware (from the web site)
[06:32:42] <joeyjones> Are there any major caveats to storing images and small archives in GridFS?
[06:38:14] <bmillham> Good question joeyjones, I'd like to know that also. That's something I'm considering for my site.
[06:40:49] <joeyjones> bmillham: in my case i'm looking to store a .epub file, metadata, and thumbnail in a document for each book
[06:41:25] <joeyjones> i could make do with S3 or similar, but I'm interested in seeing how GridFS would handle it
[06:41:40] <bmillham> For me it would be the thumbnail of an artist or album cover
[06:42:38] <bmillham> But I have no idea how well that would work. I have my basic database working well, but it's still changing as I'm learning
[07:08:21] <joeyjones> bmillham: i know the feeling, i just started playing with mongodb yesterday but with a background in oracle, mysql, and AWS DynamoDB it's not too big of a stretch
[07:09:48] <bmillham> Well, for me, it's been un-learning SQL
[07:11:01] <bmillham> I think it was Boomtime that suggested that I forget what I know. And that has been good advice.
[07:11:17] <bmillham> But it's hard to do
[07:11:38] <joeyjones> bmillham: the big step for most people it to get away from normalization and picture things as documents
[07:12:25] <joeyjones> i mentioned denormalization to my databases prof when i ran into her a month ago and her response was hilarious
[07:15:52] <bmillham> Trying to decide where an embedded document is better vs. a related document. I've gone a bit of a hybrid method. I'm designing the database around what I know that the web site that will be using it wants for results.
[07:39:05] <joeyjones> bmillham: it's often best to have a mockup of the site before you touch the DB, but mongo makes it a bit different
[07:39:21] <joeyjones> since a document can have extra fields added at any time
[07:39:34] <bmillham> I'm converting an existing site
[07:40:11] <bmillham> So I can just change the MongoDB when I decide it's not working right.
[07:40:19] <bmillham> Converting from a MySQL db
[07:42:57] <bmillham> And an all new site, Python based instead of php based
[08:01:23] <joeyjones> bmillham: My current debate is between having author as a field of the book collection or as a reference to an author collection
[08:01:40] <joeyjones> because one of the main user flows will be a filter by author
[08:02:14] <bmillham> joeyjones: same for me. Should I have the artist/album as a embedded document in a song.
[08:02:26] <joeyjones> I just don't know
[08:03:06] <bmillham> What I'm doing so far is to have embedded documents, but for searching I also have the artist name and album name in the song document.
[08:03:07] <joeyjones> I feel like it could be my SQL background urging me to use a reference but trying to deal with search/filter by auther as a freeform text field will be a pita
[08:03:29] <bmillham> Yep, forget SQL....
[08:03:44] <bmillham> Think about the results that you want
[08:04:33] <joeyjones> for simplicity having it as a field would be best, but from a usability perspective a reference would be best
[08:05:55] <joeyjones> i'll probably just go with field and add some logic for renaming the author field across all documents
[08:06:54] <joeyjones> i'm going to need to dive into indexing soon :p
[08:08:09] <bmillham> That's the route I'm taking. And YES, look at your indexing!
[08:08:39] <joeyjones> bmillham: in my case indexing will be more useful once i get my searches figured out
[08:08:41] <bmillham> You can index Embedded Documents also
[08:09:15] <joeyjones> i jsut had an interesting thought
[08:09:25] <bmillham> Look at text searches (like fuzzy search) they work real good! Much better than MySQL.
[08:09:39] <joeyjones> title, author, and publisher can be arrays
[08:10:04] <bmillham> Things like if you search for dance, results will include dancing, danced etc
[08:10:05] <joeyjones> the idea being that i have 2 sources of data: the .epub metadata, and data pulled from google books
[08:11:11] <joeyjones> if the .epub has an isbn present than google books is authorative, but otherwise i do a search based on the title, author, and publisher and the resulting data could be wrong
[08:11:37] <joeyjones> actually, it shouldn't...
[08:11:49] <joeyjones> am i over-thinking my edge cases...
[08:13:56] <joeyjones> bmillham: yeah, full text search is one reason why i'm thinking that a reference would be a bad idea
[08:14:11] <bmillham> So you are not converting an existing database, you are designing one from scratch
[08:14:20] <joeyjones> yeah
[08:14:24] <joeyjones> this is a side project
[08:15:00] <bmillham> You could still use reference, but store the search data in the main document. That's what I'm doing
[08:15:03] <joeyjones> a web app to host my .epub books for my kobo reader that i can use to simplify getting books onto the ereader and to create reading lists
[08:15:30] <joeyjones> I'd rather handle 1 edge case than add another :p
[08:16:26] <bmillham> In my case, the Songs collection is the main collection, where most searching will happen. The Artists and Albums collections are secondary. Used when looking for a specific artist or album.
[08:17:20] <joeyjones> i'm thinking of just handling the filters by author/etc as a string filter
[08:17:28] <joeyjones> not the best for indexing but it's simple
[08:24:58] <bmillham> Well, if you have the authoritative data, why add the other data? Remember, MongoDB doesn't have to have the same fields in each document!
[08:33:21] <bmillham> Bedtime for me
[08:33:28] <bmillham> Nite joeyjones
[08:33:38] <joeyjones> cya
[08:59:57] <joeyjones> I was right, boogle can give me shitty responses
[09:00:00] <joeyjones> *Google
[11:07:16] <MadLamb> hey, i accidently created a collection name with ";]" in the end and now i cant delete it, i tried the square notation but the drop returns false.
[11:37:11] <giuseppesolinas> hello
[11:38:18] <giuseppesolinas> hopefully simple question: how do I get an usable url for a collection in my db for a specified user? can I generate it from console?
[12:38:40] <repxl> how i can find the records name : "something" http://pastebin.com/3z6SuMdu
[12:39:12] <arussel> {"records.name": "something"}
[12:40:52] <Neo9> can any one help on this problem
[12:40:58] <Neo9> https://pastee.org/5vya3
[12:41:35] <repxl> arussel cool work's i tought i need the $ operator like records.$.name since records is a array
[12:44:17] <arussel> repxl: not for such a simple query. $ or elemMatch would be to match a specific element of an array using multiple criteria
[12:45:16] <repxl> arussel can i get only the "structure" array with findOne ?
[12:47:03] <repxl> arussel it finds me the full document ... i want that find returns only the array "structure" can i do this ?
[12:51:08] <Neo9> https://pastee.org/yau47
[12:51:19] <Neo9> can any one please help on this?
[12:59:36] <repxl> can i make findOne return only "structure" ? not the full document http://pastebin.com/3z6SuMdu
[13:10:26] <yakari> hi
[13:10:37] <yakari> i have a problem with a find query since i have created indexes
[13:10:41] <yakari> someone can help me ?
[13:11:15] <rbistolfi> repxl: not an expert, but I think findOne({ query ...}, {structure: 1}) should do it
[13:12:07] <yakari> i use a geonames database and mongodb does not return me result when i use particular fields
[13:14:14] <repxl> rbistolfi ty
[13:15:54] <rbistolfi> hey
[13:19:04] <yakari> allo ?
[13:52:12] <repxl> getTimestamp() is javascript ?
[14:13:51] <repxl> i want sort my documents by date , i have read that objectID has inside build the date so i need retrive the date first from the _id and than sort it or how ?
[14:19:17] <repxl> items.find.sort( [['_id', -1]] ) will give me order by created date ?
[14:39:47] <asturel_> if i have a docs like this: { "_id" : ObjectId("54a16500460bd7c07e8c5109"), "player" : { "serial" : "2C815A80E6C04F36BB807EA7D8DC0792", "ip" : "78.165.7.218", "name" : "ferdi", "id" : 423192 }, "time" : ISODate("2014-12-29T14:28:16Z"), "atmid" : 11, "amount" : -43431 } -- why db.bank.findOne({$contains:{"player.name":"ferdi"}}) gives null?
[14:40:55] <asturel_> but db.bank.findOne({"player.name": /.*ferdi.*/i}) works
[14:42:31] <cheeser> $contains isn't an operator?
[14:48:07] <asturel_> idk, i found on http://stackoverflow.com/questions/10610131/checking-if-a-field-contains-a-string
[14:59:15] <jaitaiwan> asturel_: could even do {"player.name":"ferdi"} and it should work.
[15:00:22] <asturel_> yeah it does but only for the exact match
[15:00:50] <asturel_> and i didnt want to overkill with regexp
[15:01:01] <jaitaiwan> Then you can use {'player.name':{'$regex':'ferdi'}}.
[15:01:02] <jaitaiwan> AH
[15:01:15] <asturel_> :D
[15:01:53] <asturel_> there's no simple string operator like 'like' ?
[15:02:11] <asturel_> or i shouldnt care about that?
[15:02:57] <jaitaiwan> I wouldn't worry about it. Mongodb is all javascript (kinda). So regex is the natural way to map quickly I think.
[15:08:14] <asturel_> ok thanks
[15:14:12] <repxl> does mongodb save date in UTC ?
[15:18:19] <jaitaiwan> repxl: the ISODate object which mongodb uses to store the date contains the timezone encoded with it from what I understand. So simple answer is, if you tell it to.
[15:19:55] <jaitaiwan> repxl: my default installation of mongodb on ubuntu saves the ISODates with the timezone of zero which I think is equiv to UTC. Just have to make sure the datetime you give it is in utc as well.
[16:46:17] <valenciae> Hello everybody! - What is data reaping, anyone knows?
[16:46:48] <valenciae> Ignore, I found the answer.
[17:32:03] <hicker> I keep getting intermittent mongodb connection errors... Do you think that's a load balancing issue on the server? Or am I establishing too many connections in node?
[20:39:37] <Nilium> Can anyone think of a reason why the 1.4.x node.js mongodb driver would yield a parseError when doing db.collection(col).findOne({ _id: mongodb.ObjectID("some_id") }) with a 16mb document other than that it's a 16mb document and that's probably horrible?
[20:40:50] <MadLamb> hey, i accidently created a collection name with ";]" in the end and now i cant delete it, i tried the square notation but the drop returns false.
[20:50:19] <bazineta> Weird one; perhaps try the db.getCollection("collection_name;]").drop() format
[21:35:04] <Nilium> Looks like the findOne issue I had is probably just related partly to the socket timeout used and the mongo driver being outdated.
[21:50:45] <unholycrab> does the local database get replicated to secondaries?
[22:20:13] <Daniel_yo> Hey how do I update a field in ALL the documents of a collection ?
[22:21:23] <Daniel_yo> Movie.update({}, {$set: {yearRank: 999}}, {multi: true}, callback);
[22:21:27] <Daniel_yo> does not work ..
[22:21:39] <elux> im thinking of trying 2.8-rc4 .. and currently my replica set is on 2.6.6 .. is this a clean swap, or do i have to do something?
[23:07:26] <d4rklit3> hi
[23:07:39] <d4rklit3> having a hard time with this query: db.tweets.find({$where:{date:{'&lte':'2014-12-22T05:00:00.000Z'}}})
[23:07:46] <d4rklit3> getting this error: "$err" : "Can't canonicalize query: BadValue $where got bad type",