replSet error RS102 too stale tojs catch errorup,这个问题怎么解决

查看: 2677|回复: 2
replSet error RS102 too stale to catch up,这个问题怎么解决
论坛徽章:2
turn:1 reslen:155 0ms
Thu Jul 26 09:39:54 [conn2940] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& }
Thu Jul 26 09:39:54 [conn2940] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:55 [conn2941] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& }
Thu Jul 26 09:39:55 [conn2941] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:56 [conn2940] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& }
Thu Jul 26 09:39:56 [conn2940] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:57 [conn2941] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& }
Thu Jul 26 09:39:57 [conn2941] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:58 [rsSync] replSet syncing to: 192.168.30.103:27017
Thu Jul 26 09:39:58 BackgroundJob starting: ConnectBG
Thu Jul 26 09:39:58 [rsSync] replHandshake res not: 1 res: { ok: 1.0 }
Thu Jul 26 09:39:58 [rsSync] replSet error RS102 too stale to catch up, at least from 192.168.30.103:27017
Thu Jul 26 09:39:58 [rsSync] replSet our last optime : Jul 20 21:40:18 5
Thu Jul 26 09:39:58 [rsSync] replSet oldest at 192.168.30.103:27017 : Jul 25 15:28:41 500fa029:262a
Thu Jul 26 09:39:58 [rsSync] replSet See
Thu Jul 26 09:39:58 [rsSync] replSet error RS102 too stale to catch up
Thu Jul 26 09:39:58 [journal] lsn set
Thu Jul 26 09:39:58 [conn2940] end connection 192.168.30.33:59026
Thu Jul 26 09:39:58 [initandlisten] connection accepted from 192.168.30.33:5
Thu Jul 26 09:39:58 [conn2942] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& }
Thu Jul 26 09:39:58 [conn2942] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:59 [conn2941] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& }
Thu Jul 26 09:39:59 [conn2941] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:40:00 [conn2942] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& }
Thu Jul 26 09:40:00 [conn2942] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:40:00 [conn42] run command admin.$cmd { ismaster: 1 }
Thu Jul 26 09:40:00 [conn42] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 reslen:274 0ms
Thu Jul 26 09:40:00 [conn42] run command admin.$cmd { replSetGetStatus: 1 }
Thu Jul 26 09:40:00 [conn42] command admin.$cmd command: { replSetGetStatus: 1 } ntoreturn:1 reslen:631 0ms
Thu Jul 26 09:40:01 [conn2941] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& }
Thu Jul 26 09:40:01 [conn2941] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:40:02 [conn2942] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& }
Thu Jul 26 09:40:02 [conn2942] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:40:03 BackgroundJob starting: ConnectBG
Thu Jul 26 09:40:03 [conn2941] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& }
Thu Jul 26 09:40:03 [conn2941] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:40:04 [conn2942] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& }
Thu Jul 26 09:40:04 [conn2942] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.33:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:40:04 BackgroundJob starting: ConnectBG
Thu Jul 26 09:40:05 [conn2941] run command admin.$cmd { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& }
Thu Jul 26 09:40:05 [conn2941] command admin.$cmd command: { replSetHeartbeat: &shard1&, v: 1, pv: 1, checkEmpty: false, from: &192.168.30.103:27017& } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:40:06 [DataFileSync] flushing mmap took 0ms&&for 5 files
Thu Jul 26 09:40:06 [PeriodicTask::Runner] task: WriteBackManager::cleaner took: 0ms
Thu Jul 26 09:40:06 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 0ms
Thu Jul 26 09:40:06 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 0ms
论坛徽章:2
PRIMARY& rs.status()
& && &&&&set& : &shard1&,
& && &&&&date& : ISODate(&T02:26:03Z&),
& && &&&&myState& : 1,
& && &&&&members& : [
& && && && && & {
& && && && && && && && &&_id& : 0,
& && && && && && && && &&name& : &192.168.30.31:27017&,
& && && && && && && && &&health& : 1,
& && && && && && && && &&state& : 3,
& && && && && && && && &&stateStr& : &RECOVERING&,
& && && && && && && && &&uptime& : 46826,
& && && && && && && && &&optime& : {
& && && && && && && && && && &&&&t& : 0,
& && && && && && && && && && &&&&i& : 562
& && && && && && && && &},
& && && && && && && && &&optimeDate& : ISODate(&T13:40:18Z&),
& && && && && && && && &&lastHeartbeat& : ISODate(&T02:26:02Z&),
& && && && && && && && &&pingMs& : 0,
& && && && && && && && &&errmsg& : &error RS102 too stale to catch up&
& && && && && & },
& && && && && & {
& && && && && && && && &&_id& : 1,
& && && && && && && && &&name& : &192.168.30.103:27017&,
& && && && && && && && &&health& : 1,
& && && && && && && && &&state& : 1,
& && && && && && && && &&stateStr& : &PRIMARY&,
& && && && && && && && &&optime& : {
& && && && && && && && && && &&&&t& : 0,
& && && && && && && && && && &&&&i& : 549
& && && && && && && && &},
& && && && && && && && &&optimeDate& : ISODate(&T09:21:50Z&),
& && && && && && && && &&self& : true
& && && && && & },
& && && && && & {
& && && && && && && && &&_id& : 2,
& && && && && && && && &&name& : &192.168.30.33:27017&,
& && && && && && && && &&health& : 1,
& && && && && && && && &&state& : 7,
& && && && && && && && &&stateStr& : &ARBITER&,
& && && && && && && && &&uptime& : 46804,
& && && && && && && && &&optime& : {
& && && && && && && && && && &&&&t& : 0,
& && && && && && && && && && &&&&i& : 0
& && && && && && && && &},
& && && && && && && && &&optimeDate& : ISODate(&T00:00:00Z&),
& && && && && && && && &&lastHeartbeat& : ISODate(&T02:26:02Z&),
& && && && && && && && &&pingMs& : 0
& && && && && & }
& && &&&],
& && &&&&ok& : 1
论坛徽章:2
5 down vote accepted
You don't need to repair, simply perform a full resync.
On the secondary, you can:
& & stop the failed mongod
& & delete all data in the dbpath (including subdirectories)
& & restart it and it will automatically resynchronize itself
Follow the instructions here.
What's happened in your case is that your secondaries have become stale, i.e. there is no common point in their oplog and that of the oplog on the primary. Look at this document, which details the various statuses. The writes to the primary member have to be replicated to the secondaries and your secondaries couldn't keep up until they eventually went stale. You will need to consider resizing your oplog.
Regarding oplog size, it depends on how much data you insert/update over time. I would chose a size which allows you many hours or even days of oplog.
Additionally, I'm not sure which O/S you are running. However, for 64-bit Linux, Solaris, and FreeBSD systems, MongoDB will allocate 5% of the available free disk space to the oplog. If this amount is smaller than a gigabyte, then MongoDB will allocate 1 gigabyte of space. For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog and for 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog.
How big are records and how many do you want? It depends on whether this data insertion is something typical or something abnormal that you were merely testing.
For example, at 2000 documents per second for documents of 1KB, that would net you 120MB per minute and your 5GB oplog would last about 40 minutes. This means if the secondary ever goes offline for 40 minutes or falls behind by more than that, then you are stale and have to do a full resync.
I recommend reading the Replica Set Internals document here. You have 4 members in your replica set, which is not recommended. You should have an odd number for the voting election (of primary) process, so you either need to add an arbiter, another secondary or remove one of your secondaries.
Finally, here's a detailed document on RS administration.
itpub.net All Right Reserved. 北京皓辰网域网络信息技术有限公司版权所有    
 北京市公安局海淀分局网监中心备案编号: 广播电视节目制作经营许可证:编号(京)字第1149号同步一个过期很久的MongoDB副本 -
- ITeye博客
博客分类:
登录副本集的主节点运行rs.status()命令看到如下信息:
"_id" : 4,
"name" : "55.55.55.55:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 502511,
"optime" : {
"i" : 5028
"optimeDate" : ISODate("T00:05:38Z"),
"lastHeartbeat" : ISODate("T22:47:00Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
虽然MongoDB官方有相关的文档专门处理这种情况,但是我选择使用一个最简单的方式来解决。首先指定数据的存储目录,查看/etc/mongodb.conf文件看到dbpath设置到了/var/lib/mongodb路径,然后停止节点,删除数据目录,重启节点即可。如果使用到了auth则还需要key文件,并且要保证目录存在以及有适当的权限可以让节点顺利运行。
下附MongoDB 2.6.0 官方手册
下载次数: 3
浏览: 13203 次
来自: 北京
写道jconsole也是向上趋势,但内存是 ...
jconsole也是向上趋势,但内存是先减少再增加,一直没有泄 ...用户名:Wendy224
访问量:7307
注册日期:
阅读量:1297
阅读量:3317
阅读量:442788
阅读量:1128900
51CTO推荐博文
问题&&&&1)程序报警Execution&Timeout
Couldn't&get&a&connection&within&the&time&limit&&&&2)mongod日志Jun&11&21:48:35&mongod&mongod:&T21:48:35.122+0800&I&NETWORK&&[initandlisten]&connection&accepted&from&10.0.0.1:2&connections&now&open)
Jun&11&21:48:35&mongod&mongod:&T21:48:35.136+0800&I&ACCESS&&&[conn32]&Successfully&authenticated&as&principal&__system&on&local
Jun&11&21:48:35&mongod&mongod:&T21:48:35.349+0800&I&-&&&&&&&&[rsSync]&Assertion:&10334:BSONObj&size:&0&(0x0)&is&invalid.&Size&must&be&between&0&and&MB)&First&element:&EOO
Jun&11&21:48:35&mongod&mongod:&T21:48:35.358+0800&I&CONTROL&&[rsSync]
Jun&11&21:48:35&mongod&mongod:&0x132c032&0x12cb29f8&0x12b2aac&0x9da659&0xae692f&0x106e4dc&0xxxfd64f5&0xaea0fe&0xaea621&0xebe304&0xf563ae&0xf57c78&0xf4d29b&0x1b5c330&0x7efd3x7efd3055f73d
Jun&11&21:48:35&mongod&mongod:&-----&BEGIN&BACKTRACE&-----
Jun&11&21:48:35&mongod&mongod:&{"backtrace":[{"b":"400000","o":"F2C032","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EC9988","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EB29F8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"EB2AAC"},{"b":"400000","o":"5DA659","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"6E692F","s":"_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE"},{"b":"400000","o":"C6E4DC","s":"_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib"},{"b":"400000","o":"C66B1E","s":"_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"C66D69","s":"_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"BD64F5","s":"_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb"},{"b":"400000","o":"6EA0FE","s":"_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b"},{"b":"400000","o":"6EA621","s":"_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb"},{"b":"400000","o":"ABE304","s":"_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE"},{"b":"400000","o":"B563AE","s":"_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE"},{"b":"400000","o":"B57C78","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE"},{"b":"400000","o":"B4D29B","s":"_ZN5mongo4repl13runSyncThreadEv"},{"b":"400000","o":"175C330","s":"execute_native_thread_routine"},{"b":"7EFD","o":"7DC5"},{"b":"7EFD","o":"F773D","s":"clone"}],"processInfo":{&"mongodbVersion"&:&"3.2.12",&"gitVersion"&:&"ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14",&"compiledModules"&:&[],&"uname"&:&{&"sysname"&:&"Linux",&"release"&:&"3.10.0-514.6.2.el7.x86_64",&"version"&:&"#1&SMP&Thu&Feb&23&03:04:39&UTC&2017",&"machine"&:&"x86_64"&},&"somap"&:&[&{
Jun&11&21:48:35&mongod&mongod:&"elfType"&:&2,&"b"&:&"400000",&"buildId"&:&"BAAC1A970F6D0F06B88D0DE75BF06E4C260939EC"&},&{&"b"&:&"7FFE4D008000",&"elfType"&:&3,&"buildId"&:&"B211AB0CF8DE793F40F3FF996F5B4"&},&{&"b"&:&"7EFD",&"path"&:&"/lib64/libssl.so.10",&"elfType"&:&3,&"buildId"&:&"90EAF65D9B0EEEBF7F7D"&},&{&"b"&:&"7EFD",&"path"&:&"/lib64/libcrypto.so.10",&"elfType"&:&3,&"buildId"&:&"1D98DDD0FA00F92B67AD78C7B7F40"&},&{&"b"&:&"7EFD",&"path"&:&"/lib64/librt.so.1",&"elfType"&:&3,&"buildId"&:&"82E77ADE22BC9FFF8DE7EDF174C28"&},&{&"b"&:&"7EFD30F5D000",&"path"&:&"/lib64/libdl.so.2",&"elfType"&:&3,&"buildId"&:&"C5FAF52EFFBB"&},&{&"b"&:&"7EFD30C5B000",&"path"&:&"/lib64/libm.so.6",&"elfType"&:&3,&"buildId"&:&"721C7CC9488EFA25F83B48AF713AB27DBE48EF3E"&},&{&"b"&:&"7EFD30A45000",&"path"&:&"/lib64/libgcc_s.so.1",&"elfType"&:&3,&"buildId"&:&"408B46E291B2D4C9D165D7E186D40"&},&{&"b"&:&"7EFD",&"path"&:&"/lib64/libpthread.so.0",&"elfType"&:&3,&"buildId"&:&"C3DEB1FA27CD0C1C3CC575B944ABACBA"&},&{&"b"&:&"7EFD",&"path"&:&"/lib64/libc.so.6",&"elfType"&:&3,&"buildId"&:&"8B2C7AA0CAF2A05D0B1F"&},&{&"b"&:&"7EFD319C1000",&"path"&:&"/lib64/ld-linux-x86-64.so.2",&"elfType"&:&3,&"buildId"&:&"8F3E366E2DB73C330A3791DEAE31AE"&},&{&"b"&:&"7EFD",&"path"&:&"/lib64/libgssapi_krb5.so.2",&"elfType"&:&3,&"buildId"&:&"A9EE2E508E4434F10"&},&{&"b"&:&"7EFD2FF33000",&"path"&:&"/lib64/libkrb5.so.3",&"elfType"&:&3,&"buildId"&:&"E09A34D9083DC6FEAF31DEEE2836D"&},&{&"b"&:&"7EFD2FD2F000",&"path"&:&"/lib64/libcom_err.so.2",&"elfType"&:&3,&"buildId"&:&"BF54B7CFBBB8BBBC67"&},&{&"b"&:&"7EFD2FAFD000",&"path"&:&"/lib64/libk5crypto.so.3",&"elfType"&:&3,&"buildId"&:&"BF8F00D7CB849ADB0B7AD66AEE6A49C"&},&{&"b"&:&"7EFD2F8E7000",&"path"&:&"/lib64/libz.so.1",&"elfType"&:&3,&"buildId"&:&"EA8E45DC8E395CC5ED97A1F1E0B65"&},&{&"b"&:&"7EFD2F6D8000",&"path"&:&"/lib64
Jun&11&21:48:35&mongod&mongod:&/libkrb5support.so.0",&"elfType"&:&3,&"buildId"&:&"1E7A92FDD6FBBCA2E147E72B6B6E1F"&},&{&"b"&:&"7EFD2F4D4000",&"path"&:&"/lib64/libkeyutils.so.1",&"elfType"&:&3,&"buildId"&:&"2E01D5AC08CAAB96B292AC58BC30A263"&},&{&"b"&:&"7EFD2F2BA000",&"path"&:&"/lib64/libresolv.so.2",&"elfType"&:&3,&"buildId"&:&"FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158"&},&{&"b"&:&"7EFD2F093000",&"path"&:&"/lib64/libselinux.so.1",&"elfType"&:&3,&"buildId"&:&"7854DF3BCF8DE6892"&},&{&"b"&:&"7EFD2EE32000",&"path"&:&"/lib64/libpcre.so.1",&"elfType"&:&3,&"buildId"&:&"AE64AA461A26E01F749D56DD0AE1"&}&]&}}
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo15printStackTraceERSo+0x32)&[0x132c032]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo10logContextEPKc+0x138)&[0x12c9988]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo11msgassertedEiPKc+0x88)&[0x12b29f8]
Jun&11&21:48:35&mongod&mongod:&mongod(+0xEB2AAC)&[0x12b2aac]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9)&[0x9da659]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xBF)&[0xae692f]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x46C)&[0x106e4dc]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib+0x5E)&[0x1066b1e]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib+0xA9)&[0x1066d69]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb+0xB5)&[0xfd64f5]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b+0x16E)&[0xaea0fe]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb+0x1B1)&[0xaea621]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE+0x144)&[0xebe304]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE+0x98E)&[0xf563ae]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE+0xD08)&[0xf57c78]
Jun&11&21:48:35&mongod&mongod:&mongod(_ZN5mongo4repl13runSyncThreadEv+0x2BB)&[0xf4d29b]
Jun&11&21:48:35&mongod&mongod:&mongod(execute_native_thread_routine+0x20)&[0x1b5c330]
Jun&11&21:48:35&mongod&mongod:&libpthread.so.0(+0x7DC5)&[0x7efd30830dc5]
Jun&11&21:48:35&mongod&mongod:&libc.so.6(clone+0x6D)&[0x7efd3055f73d]
Jun&11&21:48:35&mongod&mongod:&-----&&END&BACKTRACE&&-----
Jun&11&21:48:35&mongod&mongod:&T21:48:35.389+0800&F&-&&&&&&&&[rsSync]&terminate()&called.&An&exception&is&&attempting&to&gather&more&information
Jun&11&21:48:35&mongod&mongod:&T21:48:35.393+0800&F&-&&&&&&&&[rsSync]&DBException::toString():&10334&BSONObj&size:&0&(0x0)&is&invalid.&Size&must&be&between&0&and&MB)&First&element:&EOO
Jun&11&21:48:35&mongod&mongod:&Actual&exception&type:&mongo::MsgAssertionException
Jun&11&21:48:35&mongod&mongod:&0x132c032&0x132bb82&0x1bbd3e0&0x1b5c330&0x7efd3x7efd3055f73d
Jun&11&21:48:35&mongod&mongod:&-----&BEGIN&BACKTRACE&-----一、集群环境&&&&3个mongos、3个mongo config、3个shard(每个分片各三台,一主两从)二、事故描述&&&&次数1:某次活动,由于并发太多,导致mongod连接数太多,内存消耗太多(每个连接上来会分配一定的内存空间),导致整体query慢,产生大量堆积(应用报超时等错误)。直接重启mongod,重启前未stepDown,其中一个从节点2发现错误。临时剔除节点,打算后续有时间解决(原因未知)&&&&次数2:此时距离上次事故6天,节点2还未修复。同一个分片主节点1意外出现同样的错误(原因未知)此时只有一个主节点,熬夜紧急修复这两个节点&&&&次数3:距离第二次事故两天,同一个分片主节点3又出现了同样的错误(原因未知)此时一阵冷汗,同一个分片轮番坏了一遍,辛亏解决的及时。三、解决步骤&&&&1、查看报错信息,感觉像是数据块损坏问题。并且此分片不能做更新操作,但是可以正常查询。&&&&2、问题定位,oplog.rs损坏&&db
&&db.oplog.rs.find()
&&error:&{
&&&&&&&&&&"$err"&:&"BSONObj&size:&&(0x1073656E)&is&invalid.&Size&must&be&between&0&and&MB)&First&element:&Status:&?type=100",
&&&&&&&&&&"code"&:&10334
&&}&&&&3、重建oplog&&&&&&&&1)找到最近的一条oplog记录&&db.oplog.rs.find(&{&},&{&ts:&1,&h:&1&}&).sort(&{$natural&:&-1}&).limit(1).next()
{"ts"&:&Timestamp(,&46),&"h"&:&NumberLong("9430008")}&&&&&&&&2)保存记录&&db.temp.save(db.oplog.rs.find(&{&},&{&ts:&1,&h:&1&}&).sort(&{$natural&:&-1}&).limit(1).next()&)
//确认操作,很重要
&db.temp.find()&&&&&&&&3)删除oplog.rs,物理文件不会删除&&db.oplog.rs.drop()&&&&&&&&4)建立新的oplog&&db.runCommand(&{&create:&"oplog.rs",&capped:&true,&size:&(50&*&1024&*&1024&*&1024)&}&)
"errmsg"&:&"not&authorized&on&local&to&execute&command&{&create:&\"oplog.rs\",&capped:&true,&size:&.0&}",
"code"&:&13
//很悲催,重建capped类型的表,显示没权限,而oplog.rs必须要是capped类型,可是已经是root最大权限&&&&&&&&将shardsvr角色更改为configsvr角色,去掉keyfile认证(相当于去掉权限),重启服务,再次执行configsvr&&db.runCommand(&{&create:&"oplog.rs",&capped:&true,&size:&(50&*&1024&*&1024&*&1024)&}&);
{&"ok"&:&1&}&&&&&&&&为什么要更改角色呢?&&&&&&&&&&&&--因为shardsvr角色,去掉keyfile认证,服务启动不了。哭~~~&&&&&&&&5)将上次操作的最近一条记录写到oplog中&&db.oplog.rs.save(&db.temp.findOne()&)&&&&&&&&6)确认操作&&db.oplog.rs.find()
{"ts"&:&Timestamp(,&46),&"h"&:&NumberLong("9430008")}&&&&&&&&7)修改配置文件,更改会原来的配置,重启服务。重新加入到集群中。查看延迟状态rs:SECONDARY&&db.printSlaveReplicationInfo();
source:&mongod122:10000
syncedTo:&Mon&Jun&12&:41&GMT+0800&(CST)
0&secs&(0&hrs)&behind&the&primary&
source:&mongod121:10000
syncedTo:&Sun&Jun&11&:12&GMT+0800&(CST)
11969&secs&(3.32&hrs)&behind&the&primary&&&&& &待延时追上后,一切归于正常。&&&&至今,还不知道是什么原因导致的该分片频繁发生此类错误,后续如有进展,再更新。本文出自 “” 博客,请务必保留此出处
了这篇文章
类别:┆阅读(0)┆评论(0)

我要回帖

更多关于 java catch error 的文章

 

随机推荐