优化MongoDB索引

要在MongoDB上使应用程序运行性能良好，好的索引必不可少。当它将你的索引放在RAM中时，将能使它达到最好的性能。减少索引的大小亦有助于得到更快的查询速度，并通过更小的内存管理更多的数据。

以下是一些用来减小MongoDB索引大小的技巧：

检查索引的大小

首先你应该做的是去了解你的索引的大小。在你做出一些改变并检查这种改变是否能减少索引大小之前，你会想先知道索引目前的大小。理想状态下，你一直在使用着你的监测工具图形化监测索引。

使用Mongo shell时，我们可以通过运行db.stats()命令来得到索引统计数据 :

> db.stats(){

“db” : “examples1”,

“collections” : 6,

“objects” : 403787,

“avgObjSize” : 121.9966467469235,

“dataSize” : 49260660,

“storageSize” : 66695168,

“numExtents” : 20,

“indexes” : 9,

“indexSize” : 48524560,

“fileSize” : 520093696,

“nsSizeMB” : 16,

“ok” : 1

}

Indexes : 在 examples1 数据库中的索引数目；
indexSize : 在 examples1 数据库中索引的大小

因为每个数据集合( collection )都拥有索引，所以你也可以通过执行 db.collection.stats( ) 来检查它们：

> db.address.stats(){

“ns” : “examples1.address”,

“count” : 3,

“size” : 276,

“avgObjSize” : 92,

“storageSize” : 8192,

“numExtents” : 1,

“nindexes” : 2,

“lastExtentSize” : 8192,

“paddingFactor” : 1,

“flags” : 1,

“totalIndexSize” : 16352,

“indexSizes” : {

“_id_” : 8176,

“_types_1” : 8176

“ok” : 1

}

totalIndexSize – 在数据集合( collection )所有索引的大小；
indexSizes – 由索引名称与大小组成的字典( dictionary )

注意 : 这里所有由执行命令返回的结果都是以bytes为单位。

这些命令都很有用但它们手工使用起来很乏味。我写了一个工具index-stats.py来生成索引统计数据的报告，让事情变得更简单。你可以在Github上的mongodb-tools 项目中找到它。

(virtualenv) mongodb-tools$ ./index-stats.pyChecking DB: examples2.system.indexes

Checking DB: examples2.things

Checking DB: examples1.system.indexes

Checking DB: examples1.address

Checking DB: examples1.typeless_address

Checking DB: examples1.user

Checking DB: examples1.typeless_user

Index Overview

+—————————-+——————————–+———+————-+

+—————————-+——————————–+———-+————+

| examples1.address | _id_ | 0.0% | 7.98K |

| examples1.address | _types_1 | 0.0% | 7.98K |

| examples1.typeless_address | _id_ | 0.0% | 7.98K |

| examples1.typeless_user | _id_ | 10.1% | 6.21M |

| examples1.typeless_user | address_id_1 | 10.1% | 6.21M |

| examples1.typeless_user | typeless_address_ref_1 | 5.9% | 3.62M |

| examples1.user | _id_ | 10.1% | 6.21M |

| examples1.user | _types_1 | 6.9% | 4.24M |

| examples1.user | _types_1_address_id_1 | 12.2% | 7.51M |

| examples1.user | _types_1_address_ref_1 | 26.2% | 16.09M |

| examples2.things | _id_ | 10.1% | 6.21M |

| examples2.things | _types_1 | 8.4% | 5.13M |

+—————————-+——————————–+———-+————+

Top 5 Largest Indexes

+—————————-+——————————–+———-+————+

+—————————-+——————————–+———-+————+

| examples1.user | _types_1_address_ref_1 | 26.2% | 16.09M |

| examples1.user | _types_1_address_id_ 1 | 12.2% | 7.51M |

| examples1.typeless_user | _id_ | 10.1% | 6.21M |

| examples2.things | _types_1 | 8.4% | 5.13M |

| examples1.user | _types_1 | 6.9% | 4.24M |

+—————————-+——————————–+———-+————+

Total Documents: 600016

Total Data Size: 74.77M

Total Index Size: 61.43M

RAM Headroom: 2.84G

Available RAM Headroom: 1.04G

输出的结果展示了总索引大小、每个索引的大小、以及它们的相对大小。此外，报告还指出了在你的所有数据集合( collection )中最大的五个索引。这让检测最大索引、找出能为减少整体大小提供最大贡献的那一个索引变得简便起来。

RAM Headroom是你的物理内存–索引大小。一个看起来不错的值意味着你有可用的RAM给索引来装入内存。
Available RAM Headroom是空余内存–索引大小。因为这个系统上还有其他进程在消耗内存，所以我没有可用的总RAM Headroom。

统计RAM Headroom数据的想法来自于MongoDB monitoring service，我使用的是ServerDensity.

通过这个输出，我可以第一时间聚焦到examples1.user数据集合( collection )和索引types_1_address_ref_1与types_1_address_id_1 的状况。

2 )删除冗余的索引

如果你已经发布了一段代码并修改了一段时间，最后可能会有索引冗余。如果所有component的部分都不可用，MongoDB能使用Component索引的前缀。在之前的输出：

| examples1.user | _types_1 | 6.9% | 4.24M |

在以下冗余：

| examples1.user | _types_1_address_ref_1 | 26.2% | 16.09M |

| examples1.user | _types_1_address_id_1 | 12.2% | 7.51M |

因为_types_1是这两个索引的前缀。删除它将会为总索引大小节省4.2M的空间，并且当user documents改变时，也只需更新更少的索引。

为了更容易地发现这些索引，你可以从mongodb-tools运行redundant-indexes.py：

(virtualenv)mongodb-tools$ ./redundant-indexes.pyChecking DB: examples2

Checking DB: examples1

Index examples1.user[_types_1] may be redundant with examples1.user[_types_1_address_ref_1]

Index examples1.user[_types_1] may be redundant with examples1.user[_types_1_address_id_1]

Checking DB: local

3 ）执行Compact 命令

如果你正在使用MongoDB 2.0+的版本，你可以执行compact 命令来整理collections和重建索引。执行compact 命令会锁住数据库，所以请在事先确认你清楚地知道你是在什么地方执行这个操作。如果你在Replica sets中执行，那么最简单的事情就是在你的secondaries中执行，每次一个，备份主要的部分到新的secondary中去并在老的primary中执行Compact操作。

4 )MongoDB 2.0 索引改进

如果你还在使用MongoDB 2.0或者更新版本，升级并重建你的索引将会提供大约25%的空间节省。

请看 Index Performance Enhancements

5 )检查索引规则

另一件事便是检查你的索引规则。你想要被索引的值小并且提高易查询性( selective)。索引值并不能帮助MongoDB发现你的数据在更快地降低查询速度并增加索引大小。如果你的应用程序正在使用Mapping框架，并且它支持在代码中定义索引，你应该检查看看它到底是如何创建索引的。比如Pyhthon中的MongoEngine使用”_types”来鉴别在同一个数据集合（collection）中的子类。这可能导致索引占用很大的空间并且可能并不增加索引的可查询性（selectivity）。

在我的测试数据中，我最大的索引是：

| examples1.user | _types_1_address_ref_1 | 26.2%

查看它的数据：

> db.user.findOne(){

“_id” : ObjectId(“4f2ef95c89a40a11c5000002”),

“_types” : [

“User”

“address_id” : ObjectId(“4f2ef95c89a40a11c5000000”),

“address_ref” : {

“$ref” : “address”,

“$id” : ObjectId(“4f2ef95c89a40a11c5000000”)

“_cls” : “User”

}

你可以看到_types是一个带有类名User值的数组。因为我的代码中没有任何关于User的子类，所以索引这个值将不会对索引的可查询性（selectivity）有任何帮助。另一方面是想想每个相关索引的值都将以”User”作为前缀，这将为导致值增加一些额外的字节并且对索引的可查询性（selectivity）无任何帮助

用以下的代码来删除掉它：

class User(Document):meta {‘index_types’:False}

索引修改为：

| examples1.user | address_ref_1 | 16.8% |

节约了23%的存储空间。

继续深入挖掘， address_ref_1 是一个Address对象的ReferenceProperty 。以上的代码展示了它是一个包含了参考文件和数据集合（collection）所指向的id的字典。如果我们将这个address_id的ReferenceProperty 改成 ObjectIdProperty，你将可以得到额外的空间节省：

| examples1.user | address_id_1 | 9.5% | 6.21M || examples1.user | address_ref_1 | 20.9% |

节约了53%。这是因为将索引的值从序列化的字典改为更能被MongoDB高度优化的ObjectId。虽然改变属性的类型的确要求代码的修改，并且你同时会失去由ReferenceProperty 提供的自动de-referencing的功能。但它可以节约大量内存。

总而言之，我们通过调整一些索引的规则降低了61%的存储并改变了一小段代码。

6 )删除/转移旧数据

在很多应用程序中，一些数据被频繁的访问。如果你有不被你的用户访问的旧数据，那么把它转移到另一个无索引的数据集合（collection）中，或者把它存储在数据库外的某个地方。理想状态下，你的数据库包含并索引可用数据中的工作集。

还有一些其他好的优化方式，你可以从以下找到它们：

- MongoDB Performance Tuning
- Optimizing MongoDB: Lessons Learned at Localytics

你如何优化你的索引呢？

优化MongoDB索引

Comment 取消回复