Moderator Control Panel ]

MOI Pro-AMD random self-rebooting problem

MOI Pro-AMD random self-rebooting problem

Postby sir » Fri Oct 14, 2016 9:08 pm

We have couple of MOI Pro-AMD streamers. Our problems is connected with configuration MOI Pro-AMD with 4x TBS6910 PCIe cards. We use all cards with CAM modules. The machine is constantly rebooting itself at random. Sometimes it works without problems for houres, but sometimes it resets itself within minutes. It isn't kernel panics, we also can't find anything in system journal/logs.

The only interesting part from dmesg is:

Code: Select All Code
[    4.269262] [Hardware Error]: MC4 Error (node 0): Watchdog timeout due to lack of progress.
[    4.271749] [Hardware Error]: Error Status: System Fatal error.
[    4.272970] [Hardware Error]: CPU:0 (16:0:1) MC4_STATUS[Over|UE|-|PCC|AddrV|-|-]: 0xf600000000070f0f
[    4.275455] [Hardware Error]: MC4_ADDR: 0x00000000fe80c000
[    4.276651] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out)


which can suggest hardware issue. This only exists in dmesg after unclean reboot (after user initiated, clean reboot this message doesn't occure).

We have also MOI Pro-AMD with 4x TBS6991SE PCIe cards, in this configuration there is no such an issue. We removed 4x TBS6910 cards, and replaced them by 4xTBS6991SE cards (we put the same CAM modules taken from TBS6910). Problem hasn't occured since last week. We also put this removed TBS6910 cards into standard workstation computer, and they work fine.

So it looks like the problem is only when we connect TBS6910 with MOI Pro-AMD (which we got from your DE supplier as a complete setup).

So far we tried:

- update to stock Centos 7
- newest tbs drivers
- newest opensource drivers
- addding external power supply for TBS6910 cards

nothing helped.

We also have an interesting case with only one MOI Pro-AMD with 2xTBS6205 cards. In this config reboot happens after every "sensors" command. It doesn't happen randomly as in previously described setup. Also we tried 'sensors' commnad without tbs drivers loaded

Code: Select All Code
[root@streamer ~]# rmmod tbs_pcie_dvb
[root@streamer ~]# rmmod tbs6205fe   
[root@streamer ~]# uname -a
Linux streamer.local 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@streamer ~]# sensors
radeon-pci-0008
Adapter: PCI adapter
... (reboot)


After reboot exactly the same [Hardware Error] message is in dmesg (as attached before). Maybe this problem is connected with new cards with FPGA PCIe bridge?
sir
 
Posts: 2
Joined: Thu Oct 13, 2016 3:52 pm

Re: MOI Pro-AMD random self-rebooting problem

Postby steven » Mon Oct 17, 2016 9:49 am

Hi

Thanks for your detail feedback ,can you take a picture of your MOIPro-AMD mainboard and power supply and for the
extra power of 6910 how you provide for this ?you can take a picture too .

and for the sensors commnad we tested server AMD at our side which works well : )

Let us focus on the 6910 issue first which is serious .

Thanks

Kind Regards

steven
steven
 
Posts: 2239
Joined: Fri Aug 06, 2010 3:23 pm

Re: MOI Pro-AMD random self-rebooting problem

Postby sir » Mon Oct 17, 2016 4:29 pm

Hi steven,

mainboard: https://i.imgsafe.org/488a66c883.jpg and https://i.imgsafe.org/488afcafcf.jpg

PSU: https://i.imgsafe.org/488ade1a15.jpg and https://i.imgsafe.org/488b120429.jpg

For external 12v power for TBS6910 cards we used standard 300W ATX PSU connected via MOLEX connectors to this power cable adapters https://i.imgsafe.org/488af0476e.jpg (we received them with cards that we bought separately). Unfortunately this not helped. Do you think that these random restarts are associated with the power supply?
sir
 
Posts: 2
Joined: Thu Oct 13, 2016 3:52 pm

Re: MOI Pro-AMD random self-rebooting problem

Postby stilmant » Wed Oct 19, 2016 4:11 pm

Just for info, I can't personally access the photo that are on site blocked by my ISP. This forum can incorporate your photo you just have to attach them to your post.
stilmant
 
Posts: 51
Joined: Fri Mar 01, 2013 3:21 pm
Location: Luxembourg

Re: MOI Pro-AMD random self-rebooting problem

Postby diabloss » Thu Nov 24, 2016 8:57 pm

question is how big is the load on the CPU?

my wild guess is heating .... removing heatsink and put some quality Arctic Silver or any quality cpu cooling stuff .... should work for sure.
once again cpu load is needed to know before anything else.
diabloss
 
Posts: 147
Joined: Thu Nov 22, 2012 2:51 am

Re: MOI Pro-AMD random self-rebooting problem

Postby tonygables » Wed Jun 28, 2017 7:37 pm

Hi,

Were you able to solve this issue? We are facing similar problems with TBS6910s.

We have 2 x TBS6910 with 4 CAMs on MOI Pro -AMD. With all tuners locked and CAMs descrambling, MOI starts rebooting every 3-4 mins.

When we lock to 2 tuners only, the problem doesn't happen. When we start locking additional tuners, MOI starts rebooting again.

The same problem happens with HP ML150/ML350 servers with same cards. When we check the logs on HP server we see "Unrecoverable PCIe error" before the server displays an alarm and becomes unresponsive.

In contrast, we have a HP server with 3 x TBS6991SEs with 6 CAM modules and we don't observe this problem.

If anyone else faced this problem, can you please comment?
tonygables
 
Posts: 7
Joined: Thu Sep 22, 2011 1:48 pm


Return to MOI Pro - AMD

Who is online

Users browsing this forum: No registered users and 15 guests