Page 1 of 1

MOI Pro-AMD random self-rebooting problem

PostPosted: Fri Oct 14, 2016 9:08 pm
by sir
We have couple of MOI Pro-AMD streamers. Our problems is connected with configuration MOI Pro-AMD with 4x TBS6910 PCIe cards. We use all cards with CAM modules. The machine is constantly rebooting itself at random. Sometimes it works without problems for houres, but sometimes it resets itself within minutes. It isn't kernel panics, we also can't find anything in system journal/logs.

The only interesting part from dmesg is:

Code: Select All Code
[    4.269262] [Hardware Error]: MC4 Error (node 0): Watchdog timeout due to lack of progress.
[    4.271749] [Hardware Error]: Error Status: System Fatal error.
[    4.272970] [Hardware Error]: CPU:0 (16:0:1) MC4_STATUS[Over|UE|-|PCC|AddrV|-|-]: 0xf600000000070f0f
[    4.275455] [Hardware Error]: MC4_ADDR: 0x00000000fe80c000
[    4.276651] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out)


which can suggest hardware issue. This only exists in dmesg after unclean reboot (after user initiated, clean reboot this message doesn't occure).

We have also MOI Pro-AMD with 4x TBS6991SE PCIe cards, in this configuration there is no such an issue. We removed 4x TBS6910 cards, and replaced them by 4xTBS6991SE cards (we put the same CAM modules taken from TBS6910). Problem hasn't occured since last week. We also put this removed TBS6910 cards into standard workstation computer, and they work fine.

So it looks like the problem is only when we connect TBS6910 with MOI Pro-AMD (which we got from your DE supplier as a complete setup).

So far we tried:

- update to stock Centos 7
- newest tbs drivers
- newest opensource drivers
- addding external power supply for TBS6910 cards

nothing helped.

We also have an interesting case with only one MOI Pro-AMD with 2xTBS6205 cards. In this config reboot happens after every "sensors" command. It doesn't happen randomly as in previously described setup. Also we tried 'sensors' commnad without tbs drivers loaded

Code: Select All Code
[root@streamer ~]# rmmod tbs_pcie_dvb
[root@streamer ~]# rmmod tbs6205fe   
[root@streamer ~]# uname -a
Linux streamer.local 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@streamer ~]# sensors
radeon-pci-0008
Adapter: PCI adapter
... (reboot)


After reboot exactly the same [Hardware Error] message is in dmesg (as attached before). Maybe this problem is connected with new cards with FPGA PCIe bridge?

Re: MOI Pro-AMD random self-rebooting problem

PostPosted: Mon Oct 17, 2016 9:49 am
by steven
Hi

Thanks for your detail feedback ,can you take a picture of your MOIPro-AMD mainboard and power supply and for the
extra power of 6910 how you provide for this ?you can take a picture too .

and for the sensors commnad we tested server AMD at our side which works well : )

Let us focus on the 6910 issue first which is serious .

Thanks

Kind Regards

steven

Re: MOI Pro-AMD random self-rebooting problem

PostPosted: Mon Oct 17, 2016 4:29 pm
by sir
Hi steven,

mainboard: https://i.imgsafe.org/488a66c883.jpg and https://i.imgsafe.org/488afcafcf.jpg

PSU: https://i.imgsafe.org/488ade1a15.jpg and https://i.imgsafe.org/488b120429.jpg

For external 12v power for TBS6910 cards we used standard 300W ATX PSU connected via MOLEX connectors to this power cable adapters https://i.imgsafe.org/488af0476e.jpg (we received them with cards that we bought separately). Unfortunately this not helped. Do you think that these random restarts are associated with the power supply?

Re: MOI Pro-AMD random self-rebooting problem

PostPosted: Wed Oct 19, 2016 4:11 pm
by stilmant
Just for info, I can't personally access the photo that are on site blocked by my ISP. This forum can incorporate your photo you just have to attach them to your post.

Re: MOI Pro-AMD random self-rebooting problem

PostPosted: Thu Nov 24, 2016 8:57 pm
by diabloss
question is how big is the load on the CPU?

my wild guess is heating .... removing heatsink and put some quality Arctic Silver or any quality cpu cooling stuff .... should work for sure.
once again cpu load is needed to know before anything else.

Re: MOI Pro-AMD random self-rebooting problem

PostPosted: Wed Jun 28, 2017 7:37 pm
by tonygables
Hi,

Were you able to solve this issue? We are facing similar problems with TBS6910s.

We have 2 x TBS6910 with 4 CAMs on MOI Pro -AMD. With all tuners locked and CAMs descrambling, MOI starts rebooting every 3-4 mins.

When we lock to 2 tuners only, the problem doesn't happen. When we start locking additional tuners, MOI starts rebooting again.

The same problem happens with HP ML150/ML350 servers with same cards. When we check the logs on HP server we see "Unrecoverable PCIe error" before the server displays an alarm and becomes unresponsive.

In contrast, we have a HP server with 3 x TBS6991SEs with 6 CAM modules and we don't observe this problem.

If anyone else faced this problem, can you please comment?