View Single Post
Old 10-21-11, 04:28 AM   #3
echevreau
Registered User
 
Join Date: Oct 2011
Posts: 3
Default Re: NVRM Xid 26 error

The context is this:

An MPI+GPU job is running for a good half hour and then gets stuck, if you look at the call stacks can be seen that all processes waiting in a MPI_alltoall except one, the node bullx002, which is always a core GPU.

The customer did a complete kill of MPI processes and then try to do a "deviceQuery" on bullx002 remains blocked and if he look at the log there is the message with the code 26 Xid
echevreau is offline   Reply With Quote