[01:10:13] <Leeol> hahuang65: How incremental is the counter in objectid() ?
[01:10:44] <hahuang65> what does that mean? I can show you consecutively created documents if you want
[01:10:46] <Leeol> ie, is it always N-documents (starting from a random number, via docs)
[01:11:48] <Leeol> hahuang65: I have some, but i'm not positive if the incremented value of ObjectID is actually unique. I assume there are cases where it isnt, or else it wouldn't be made up of the timestamp, ident, process, and counter
[01:12:46] <hahuang65> I beleive that you are right. I don't think it's absolutely guaranteed to be unique, but it's very very unlikely... I mean you can calculate what the chances are of duplication right? It's a finite amount of bits...
[01:14:09] <Leeol> hahuang65: Well, the chances of duplication would be some algorithm i don't know hah. I assume the counter is based off of sharding, and that's the catch. Hell, the documentation says the counter starts with a random value, but all of mine start at 00001, so .. yea
[01:14:39] <hahuang65> I don't believe the counter starts with a random value. At least that's not what I've experienced.
[01:14:56] <Leeol> Same, i'm just saying what the docs claim
[01:22:57] <hahuang65> Leeol: not sure how to help you man
[01:23:32] <Leeol> hahuang65: It's all good, you tried. Thanks for the discussion :)
[02:01:31] <federated_life1> how to remove config server list in mongo to use different mongo
[02:06:55] <X-Jester> does anyone happen to have experience getting "await data" working with tailable cursors with mongodb in ruby? i even tried to add_options the constant, no dice
[02:17:06] <bjori> X-Jester: you now it only works on capped collections, right?
[02:20:06] <X-Jester> bjori: indeed, i've got it capped and set to a size of about 1mb (each record is only a dozen K or so)
[02:20:26] <X-Jester> bjori: i've hit a new problem - i think i may have found the right combination of constants (or maybe not), but now i get "error: no ts field in query" when i try to open the cursor
[02:22:51] <X-Jester> looks like it happens when i add "REPLY_AWAIT_CAPABLE"
[02:23:39] <X-Jester> my basic issue here is that i can't seem to get the ruby mongodb driver to block on a tailable cursor's next_document call
[02:25:35] <X-Jester> as i understand it, that decision is up to mongo itself, based on those connection option bits
[03:17:40] <X-Jester> reply_capable is a ruby constant in the driver that according to the wire protocol spec is invalid
[03:19:13] <X-Jester> so, simply specifying the await_data constant on the cursor with add_option causes has_next? to block for about 5 seconds. so, i've got the desired behavior by looping around !cursor.closed and using has_next? to widen the loop.
[03:19:34] <FrankBullitt> Hi, brand new to Mongo/NoSQL concepts. I was wondering, is there an idea of document types/specifications in Mongo? That is, if I have a collection of cars, can I have a formal document type for trucks, sedans, etc., like classes in OOP? Or would I just store car_type as an attribute of each of the car collection documents?
[03:20:12] <X-Jester> FrankBullitt: i don't want to give bad advice but i will take a crack at this: i *think* bson marks up the object type of your object when you insert it, so that it can be recast when it's retrieved
[03:25:16] <FrankBullitt> X-Jester: Thanks for taking a crack at it. I tried to Google "recast" to find more documentation about this concept, but can't find anything. Is there perhaps a different parlance in Mongo world?
[04:20:56] <mattgordon> Is it possible to refer to a field value of a document during a query to filter it?
[04:21:53] <mattgordon> or is it only via mapreduce?
[06:45:56] <darklrd> hi, i want to make chat logging system using mongodb.. i have following fields -> from, to, time, message.. i want to log messages as documents in single collection..
[06:46:13] <darklrd> how do i get inbox view from mongodb?
[06:46:45] <darklrd> i plan to make a compound index using from, to and timestamp
[06:47:24] <darklrd> but then how do i get inbox result for a single user
[08:15:02] <shmoon> so if i want to store _id from users collection to tasks collection, do I store ObjectId or just the string representation as string?
[10:41:17] <Garo_> nice. after fucking up an index replacement I got my primary to have load average: 624.58, 603.46, 428.22 and that rs.status() showed that I was running a multi-master configuration :D
[11:09:45] <synchrone> it defenitely has bugs, some of them occur only on a newer mongodb versions
[11:09:48] <strigga> :D Sounds like "it's the best there is "
[11:10:01] <strigga> synchrone: OK I will look at that. Thanks for now
[11:10:13] <synchrone> yes, that is why i am writing a userspace gridfs driver
[11:10:59] <synchrone> maybe there are better options for linux
[11:18:41] <gyre007> guys I have a q about replica set and stale members...do I get this right that I need to fully resync a member IF timeDiffHours returned in db.getReplicationInfo is smaller than syncedTo returned in db.printSlaveReplicationInfo ?
[11:18:53] <gyre007> I might be missing out something though....apologies for being a n00b
[11:24:45] <kali> gyre007: well, tFirst in db.getReplicationInfo() is the time of the oldest record in your oplog
[11:25:00] <kali> gyre007: and syncedTo is the place your replica is
[11:26:04] <kali> gyre007: for the resync to have a reasonable chance to work, you need timeDiff/timeDiffHours to be bigger than the expected time to perform the resync
[11:57:15] <skazhy> Hi! I am making a backup snapshotting for my MongoDB sharded cluster, docs page on EC2 backup & restore refers only to data volumes, can I apply the same backup strategy to Shard config instances too?
[11:58:36] <dorong> Nodex: wow, that's much easier than what I expected - I started playing (and failed) with all sorts of array structures…. thanks
[12:10:06] <BadCodSmell> I'm using quiet but errors are going out onto stdout
[12:10:35] <BadCodSmell> I don't even seem to get a bad return value
[12:12:42] <broth> hello guys, i have a ridiculous question
[12:13:15] <broth> mongorestore return me auth failure with a password that contain *
[12:13:48] <broth> i try with '' and "" but return me ever auth failure
[12:15:13] <BadCodSmell> ah I get an exit code when I am not piping so much
[12:59:27] <synchrone> I have ~100Gb files averaging at ~180kb each. They all are stored in one GridFS bucket, and their filenames are windows-legacy backslash-separated path strings. Is there something wrong with this setup ?
[13:03:24] <synchrone> Now I want to re-implement the directory-tree traversing, but mapReduce operations on a collection of this size is much too painful.
[13:03:51] <synchrone> Which part should I try to optimize ?
[14:00:50] <remonvv> I was looking into an SO question and some dude is getting cursor timeout errors while doing inserts using the PHP driver. Is my cold actually causing brain issues or am I right in assuming cursors aren't involved in inserts?
[14:01:06] <remonvv> ron you smell like sweet flowers in the wind
[14:09:28] <richwol> Hi all. Does anyone know if it's possible to use dynamic keys with the aggregation framework's $addToSet or $push? For example, instead of '$addToSet' => array('foo'=>'bar') I want to do '$addToSet' => array('$fieldname','bar')
[14:15:20] <Nodex> I think the best way is to try it
[14:15:46] <richwol> Gdgd. When I run the query I get an invalid operator error. Just wondered if there was a different way of doing it or a different command or something
[14:15:52] <remonvv> NodeX, -1 is risky. It's a definite memory leak unless you explicitly close them. The real issue is why is it throwing the error in the first place. The PHP code involved seems dodgy to me but I'm certainly not a PHP guy.
[14:23:12] <remonvv> Nodex, yes but that doesn't change much. Neither uses client cursors.
[14:23:17] <Nodex> what I would say is happening is this... the memory on the machine is paging and causing the locks
[14:23:40] <Nodex> 300k + results in a loop - you can bet it's all done in memory and nothing is unlinked
[14:24:00] <remonvv> He's providing _id values so the save becomes an update on _id which is cursorless.
[14:24:36] <remonvv> Well I don't think the lock % is anything to worry about. If I was doing full on batch inserts and only reach 20-30% lock percentage I'd be wondering why that lock percentage isn't higher.
[14:25:14] <remonvv> He's doing safe writes too so a succesful getLastError() is invoked for every save.
[14:25:47] <remonvv> There doesn't seem to be any cursors involved and even if there are there shouldn't be situations where something is struggling hard enough to cause 100 second delays somewhere.
[14:27:10] <Nodex> If he is using mod_php / apache then it's probably not closing the old socket to mnongo
[14:27:23] <Nodex> and then in the next itteration down the line> 100s it's throwing the error
[14:27:42] <Nodex> because it -thinks- it's timing out but it's just an old unclosed socket
[14:27:54] <remonvv> That aside the PHP/C code there seems questionable unless it really is true that if select() returns 0 that the only possible path is a cursor timeout rather than any other sort of issue.
[15:31:57] <dorong> Is it safe to use the ObjectId as what being passed from the client to the server ?
[15:32:06] <dorong> for example, let's say the user sees a list of articles in the site, and he wants to filter the list to only see a specific category of articles. each category has its ObjectId (same as using the autoincrement id in a mysql-like database). so once the user chooses the category by which he wants to filter, the ObjectId of that category will be sent to the server.
[15:38:13] <saml> oh javascript Date(milliseconds, not seconds
[15:42:15] <X-Jester> FrankBullitt: did you ever get an answer to last night's question about object types?
[15:43:56] <FrankBullitt> X-Jester, thanks for following up. I did not get an answer and my reading suggests that there are no document types to speak of.
[15:44:03] <X-Jester> FrankBullitt: after more careful reading, i think you are going to have to write serializers (to_mongo) and deserializers (from_mongo) for your classes
[15:44:36] <X-Jester> FrankBullitt: if your language's mongo driver can't translate your type into a valid BSON type, you'll have to break it down into a document format that mongo can understand, and have your code serialize and deserialize itself
[15:46:53] <FrankBullitt> X-Jester, the exact problem I'm trying to solve is to dump a lot of network logs into a single collection, they're all network events. But depending on whether it's a dns request, or an http request or whatever, I want to parse it a little differently...
[15:47:41] <FrankBullitt> X-Jester, so, it seems I can keep them in different collections, give them an attribute that specifies what type they are, or if Mongo could track an object type, I could essentially have a network superclass and a dns/http subclass
[15:48:15] <FrankBullitt> not a network superclass, an event superclass
[15:48:54] <FrankBullitt> Of these, the most practical seems to be keeping them in separate collections
[15:50:05] <FrankBullitt> Although, the feature I would lose is that if you wanted to do a search of the "events" collection, I couldn't, and I'd have to search http, dns, etc.
[15:50:59] <FrankBullitt> The reason I can't just throw a type attribute on there is because the logging scripts for http, dns already have a JSON/BSON output that doesn't incorporate that
[15:51:16] <FrankBullitt> This is my rationale so far, but I am quite open to any input :)
[15:55:10] <FrankBullitt> I learned what NoSQL was yesterday, so I may brain isn't quite rewired for it yet, so I may be fundamentally off base.
[15:56:14] <Nodex> what are you trying to achieve?
[15:56:42] <gyre007> whats the default priority of the server in replica set guys ?
[16:04:14] <Derick> stdim: yes to "query" but no to "modify as normal docs"
[16:04:34] <Nodex> you can use the positional operator to "effect" part of the embedded array
[16:05:57] <FrankBullitt> Nodex, X-Jester, I have a library that outputs this http log http://pastebin.com/FR4kWDW2, I want to log this as an "event" and store it with other events, such as DNS. But once this particular JSON is input into MongoDB, how can I tell it's an http event when I pull it out?
[16:06:49] <stdim> So I can do a find() but I can't modify them in place?
[16:07:13] <Derick> stdim: you can do little modifications, but not the same as a "normal document"
[16:07:36] <Derick> the find is also on "nested_field._id" and not "_id"
[16:08:56] <stdim> Uh-huh, and can I reference to them by normal documents?
[16:09:13] <Derick> I don't understand that question
[16:09:29] <Derick> embedded docs are not standalone docs... and can't be treated like them
[16:09:45] <stdim> For example, I have an array of parents, and children that are normal documents. Can I have the children have a 'parent_id' property where I have and ID pointing to one of the parents in the arrays.
[17:44:42] <klusias> hello. i got stucked with mapreduce :) i got documents ([{"user":1,"param1":"aa"},{"user":1,"param1":""}]), and i want get out how many user:1 got documents with filled param1 value...i am not sure what to emit and what to reduce :/
[19:51:59] <shantanoo> hi, quick question. can i delete the rows, if the row doesn't contain the given column?
[19:52:45] <kali> use remove() with $exists: false
[19:52:55] <kali> and we prefer the terms documents and fields :)
[19:53:04] <shantanoo> there are 2 records. {'name':'a'} and {'name':'a', 'phone':'p'}. if want to delete the former
[19:56:54] <FrankBullitt> If I insert a document with this JSON schema/values, http://pastebin.com/FR4kWDW2, how do I differentiate this kind of document from other documents when I query the collection?
[19:58:36] <shantanoo> kali: problem solved. thanks for the help
[20:00:51] <FrankBullitt> i.e., how do I differentiate from other document schemas/types, not other documents
[20:15:45] <trevorfrancis> We are looking to migrate away from Couchbase into a more OLAP oriented storage for calculating aggregates on time series data
[20:16:00] <trevorfrancis> we have been recommended to use Mongo, but we are also looking at Redshift
[20:17:02] <kali> wow... olap on mongodb ? i love mongodb, but i would not use it for olap
[20:30:48] <trevorfrancis> so mongodb is more oltp, correct?
[20:31:07] <trevorfrancis> not designed for analytical workload, etc.
[20:31:44] <trevorfrancis> i.e., storing a large volumes of log data and running near realtime analytics on them
[20:32:26] <kali> mongo is focused on short lived queries for real time user interaction
[20:33:55] <trevorfrancis> I was thinking of materialized views for analytical information
[20:34:19] <trevorfrancis> i assume it shouldn't be used for data warehousing or anything like that either, correct?
[20:34:31] <trevorfrancis> i.e.…storing billions of records for long periods of time
[20:35:35] <kali> that could work, but it's not designed for that.
[20:36:25] <kali> the sweet spot is really the thing that goes just right behind your app server :)
[20:37:30] <trevorfrancis> we will be storing 10-20k inserts per second and run analytical information based upon that
[20:37:50] <kali> trevorfrancis: the mais issue with olap, imho, is that olap data is really rectangular. mongodb is good at dealing with data of flexible nature, but its document model has a big overhead on a column oriented or row oriented storage for rectangular data
[20:39:12] <kali> 10k 20k insert is manageable, but you'll need dozen of indexes, that will slow down the insertion rate
[20:39:58] <kali> for aggregation, the aggregation framework opens some doors
[20:41:32] <kali> shantanoo: ha ! no, then that's the aggregation framework
[20:42:24] <shantanoo> kali: thanks. looking into http://api.mongodb.org/python/current/examples/aggregation.html
[20:44:08] <trevorfrancis> kali: we are going to run standard queries on a subset of data and run aggregates (sum, avg, etc.)
[20:44:52] <trevorfrancis> much like couch base, I assume we prepare the view ahead of time and it structures that data as it comes into be able to run a query on it?
[21:14:45] <shantanoo> anyone around who use pymongo python module? aggregate is currently returning data in list. is it possible to get the data as iter?
[21:51:35] <bobinator60> $elemMatch projections only return the first element. is there a way to get all of the elements in the contained array which match?
[22:04:27] <Derick> coogle: sorry, something came up
[22:05:13] <bobinator60> $elemMatch projections only return the first element. is there a way to get all of the elements in the contained array which match?
[22:16:49] <nDuff> Is MongoDB a reasonable choice for coordinating a distributed semaphore (with expiring leases)? Anyone have advice (re: gotchas / avoiding race conditions) or a pointer to a preexisting implementation?
[22:36:49] <bean__> nDuff: if its a "cluster" the data doesn't sync right away
[22:44:02] <nDuff> bean__: How does a client select a node to read from? If there was an option to only fall back to slaves when the master was unavailable, that might be (usually) tolerable.
[22:44:19] <bean__> nDuff: I believe that is the default for replicasets
[22:46:22] <FrankBullitt> If I insert a document with this JSON schema/values representing an http request here, http://pastebin.com/FR4kWDW2, is there a way to determine if it is an http request document vs another type of document when I pull it out of Mongo?
[22:47:24] <Derick> FrankBullitt: you need to store the type as well then
[22:47:43] <Derick> or check whether http_host exists f.e.
[22:49:56] <FrankBullitt> Derick: Would I store the type as an attribute in the document or does Mongo have any way to track document/object types?
[22:52:24] <FrankBullitt> Derick, awesome, thank you so much, been asking all day.
[22:52:27] <Derick> alternatively, you can use a different collection for each document type
[22:52:40] <Derick> (but that means that you need to run more than one query to search through each type)
[22:53:09] <FrankBullitt> Derick, yeah, I really want to use the strength of the schema-less system and store a bunch of different events in a single collection.
[22:53:25] <Derick> then you need to store the type as an attribute :-)
[22:54:17] <FrankBullitt> Derick, except... :-) I don't control that output, so I think I will need to programmatically check the attributes of the document as you mentioned
[22:54:55] <FrankBullitt> A third party library outputs that JSON
[23:05:43] <fishfish> is there a way to see a list of live queries on the server
[23:05:49] <fishfish> or even a query log i could trail
[23:06:16] <hekman> I'm using the MongoReplicaSetClient in pymongo, and I'm trying to figure out how it does the "ping" as mentioned here: http://api.mongodb.org/python/current/examples/high_availability.html#secondary-reads … is it ICMP, or some protocol specific to mongo?
[23:32:54] <FrankBullitt> Can anyone recommend a tool for following a non-JSON log file, getting the lines converted to JSON, and subsequently imported into MongoDB? The log file is in tab separated ascii.
[23:53:31] <gyre007> arrrgh...we need to resize our instances on mongo...if we are running replicaset is it safe to restart them one after another ? it should be right ?