这个是google group mongodb user小组的一则讨论,提问者首先提出如下的问题:
Hi,on our main mongo box, we have 4G memory (wohooo). Our data size is 1G and index size is 5.6G. This box is 1 node of a 3 node replica set. We have 1 collection and a few indexes on that collection. This one collection stores a lot of writes from our website.
We are mostly concerned that our write speed does not degrade (right now, we're ~ 3-4ms/write). (near) worst case scenario is that the write speed would degrade dramatically (while mongo is paging).
What things should we be keeping an eye on? any particular metric to comb for? would writes (including index updates) be efficient even if, in our case, we see index size greater than ram?
Thanks.
下面是一个叫gates的哥们给的答复,comments很不错,推荐认真看一下!
Here's the general advice on scaling: - *Replica Sets* is used for scaling reads and for providing redundancy - *Sharding* is used for scaling writes
> What things should we be keeping an eye on? any particular metric to comb for? Metric #1 is to check that index size < RAM size. --- It looks like you're already past that.
Metric #2 is to check IO Utilization. (see iostat) --- You want to keep IO utilization % under 100%. By default, MongoDB is flushing every 60 seconds, which may result in IO spikes every minute or so. If you're seeing low utilization with spikes every minute, then you may want to set --syncdelay when you run mongodb. If you can set it lower
Metric #3: watch for paging. --- You can often do this with a tool like top. We also have mongostat which should give you an idea of how often data is being "paged in".
The Problems:
> We are mostly concerned that our write speed does not degrade --- This is really hard to guarantee when you've already overflowed the RAM and it's going to grow.If you only have one index, then maybe you'll get lucky. MongoDB is pretty good about "writing to the end" of the index. So old index data would just flow out of memory and rarely if ever get re-used.
However, I suspect that you have multiple indexes here. If this is the case, then it's hard to guarantee that we won't be going to disk to fetch those indexes.
The other problem here is queries. At some point you'll want to query this data. When you do, it's quite likely that you're going to force the whole index back into memory and this is going to slow down the whole system.
Possible Solutions: #1: Add more RAM -> will delay the problem #2: Less indexes? -> if possible #3: Break out data
It sounds like you have lots of transactional data. Is it possible to break out data by hour or by day or by week? This will make queries a little more difficult, but it's going to minimize the amount of indexes you need in memory at any given time.
If you have more details on what data you're storing and how you're indexing, we may be able to provide additional guidance.
- Gates