Bug 115800

Summary: nvidia: XEN support
Product: [openSUSE] openSUSE 10.3 Reporter: Stefan Dirsch <sndirsch>
Component: X11 3rd Party DriverAssignee: Roland Hui <rohui>
Status: RESOLVED WONTFIX QA Contact: Stefan Dirsch <sndirsch>
Severity: Enhancement    
Priority: P3 - Medium CC: aritger, atortola, eich, koenig, matias.krempel, max, mt, Winfrid.Tschiedel
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: nvidia driver fix for Xen kernel
This additional patch seems to fix the build
nv-fix-gartaddr-xen.diff
nv-fix-gartaddr-xen.diff
Regenerated consecutive patch

Description Stefan Dirsch 2005-09-08 08:27:29 UTC
Kurt Garloff wrote:

With a patch similar to the one ine ... (see following attachment)  I got 
nvidia to work under Xen. 

What it does is basically remove a confusion between gart and phys
addresses. On plain x86, they happen to be the same. Many drm and agp
drivers had that wrong, but it got fixed in 2.6.13.

I'm pretty confident that the changes are good and won't hurt anyone.
But checking is better ... certainly at this point in time.

Do we have the possibility to run this on a nvidia machine and see
whether it works. Let's first get it working on plain kernel ...

I had a short look at the DRM drivers in 2.6.13; they seemed to have all
been fixed.
Comment 1 Stefan Dirsch 2005-09-08 08:28:43 UTC
Created attachment 49159 [details]
nvidia driver fix for Xen kernel
Comment 2 Stefan Dirsch 2005-09-08 08:32:56 UTC
Sorry, Kurt but this patch doesn't work for me.

/tmp/NVIDIA-Linux-x86-1.0-7676-pkg1/usr/src/nv/nv-vm.c: In function
‘nv_vm_malloc_pages’:
/tmp/NVIDIA-Linux-x86-1.0-7676-pkg1/usr/src/nv/nv-vm.c:241: error: implicit
declaration of function ‘phys_to_gart’
make[4]: *** [/tmp/NVIDIA-Linux-x86-1.0-7676-pkg1/usr/src/nv/nv-vm.o] Error 1
make[3]: *** [_module_/tmp/NVIDIA-Linux-x86-1.0-7676-pkg1/usr/src/nv] Error 2
make[2]: *** [modules] Error 2
NVIDIA: left KBUILD.

This is with Kernel 2.6.13-8.
Comment 3 Stefan Dirsch 2005-09-08 08:47:15 UTC
Created attachment 49160 [details]
This additional patch seems to fix the build
Comment 4 Stefan Dirsch 2005-09-08 09:05:20 UTC
I currently have the nvidia driver with the two patches applied running. Works
so far (GeForce4 Ti 4600, IA32, Kernel 2.6.13-8-default, 1.0-7676 NVIDIA
driver). I didn't test this on x86_64 yet. Anyway, I would like to hear a
comment by Kurt 
and Andy about the two patches before we consider to include these for RC2.
Comment 5 Stefan Dirsch 2005-09-08 14:06:59 UTC
The patch doesn't build on x86_64 at all:

nv-linux.h:1057: error: implicit declaration of function ‘virt_to_gart’
nv-linux.h:1138: error: implicit declaration of function ‘gart_to_virt’
nv-linux.h:1057: error: implicit declaration of function ‘virt_to_gart’
nv-linux.h:1138: error: implicit declaration of function ‘gart_to_virt’
nv-linux.h:1057: error: implicit declaration of function ‘virt_to_gart’
nv-linux.h:1138: error: implicit declaration of function ‘gart_to_virt’
nv-linux.h:1057: error: implicit declaration of function ‘virt_to_gart’
nv-linux.h:1138: error: implicit declaration of function ‘gart_to_virt’
nv-linux.h:1057: error: implicit declaration of function ‘virt_to_gart’
nv-linux.h:1138: error: implicit declaration of function ‘gart_to_virt’

Kurt, please comment.
Comment 6 Kurt Garloff 2005-09-09 09:13:07 UTC
Stefan, thanks for testing!   
   
Sorry for screwing up the initial patch :-(   
In agp.h, I read:    
#define virt_to_gart(x) (phys_to_gart(virt_to_phys(x)))   
#define gart_to_virt(x) (phys_to_virt(gart_to_phys(x)))  
which is the right implementation.  
 
Including asm/agp.h from nv-linux.h is the right solution. But let me do a 
compile test first ... 
Comment 7 Kurt Garloff 2005-09-09 09:24:15 UTC
OK, got some more coffee. The definition is in drivers/char/agp/agp.h. 
So a solution that would work with older kernels as well could look like 
this hunk in nv-linux.h: 
#if defined (CONFIG_AGP) || defined (CONFIG_AGP_MODULE) 
#define AGPGART 
#include <linux/agp_backend.h> 
#include <linux/agpgart.h> 
#include <asm/agp.h> 
#ifndef phys_to_gart 
#define phys_to_gart(x) virt_to_bus(phys_to_virt(x)) 
#define gart_to_phys(x) virt_to_phys(bus_to_virt(x)) 
#define virt_to_gart(x) virt_to_bus(x) 
#define gart_to_virt(x) bus_to_virt(x) 
#else 
#ifndef virt_to_gart 
#define virt_to_gart(x) (phys_to_gart(virt_to_phys(x))) 
#define gart_to_virt(x) (phys_to_virt(gart_to_phys(x))) 
#endif 
#endif 
#endif 
 
I'll create a patch and (compile-) test it. 
Comment 8 Kurt Garloff 2005-09-10 12:46:16 UTC
Created attachment 49511 [details]
nv-fix-gartaddr-xen.diff

This fix builds and seems to work.
(More tests under Xen required to validate.)
Comment 9 Stefan Dirsch 2005-09-10 18:05:56 UTC
build tests (x86 + x86_64) done. runtime tests (32bit + 64bit/32bit) will 
follow. 
Comment 10 Stefan Dirsch 2005-09-11 15:25:22 UTC
> runtime tests (32bit + 64bit/32bit) will follow. 
done. works fine for me.
Comment 11 Kurt Garloff 2005-09-12 07:50:57 UTC
Stefan, thanks for testing!  
  
While the patch is harmless for native x86/x86-64 nVidia, it's unfortunately  
not enough to make the nVidia driver work with Xen.  
  
__nv_disable_caches() and __nv_enable_caches() access cr4 which Xen won't  
allow you to do.  
These functions are called from __nv_setup_pat_entries() and  
__nv_restore_pat_entries(). Passing nv_disable_pat=1 to the module helps  
to solve this, so the module loads and initializes properly.  
  
Upon startup of X11, it still crashes (on x86-64): 
 
Sep 11 00:23:02 prescott kernel: general protection fault: 0000 [1]   
Sep 11 00:23:02 prescott kernel: CPU 0   
Sep 11 00:23:02 prescott kernel: Modules linked in: nvidia bridge [...]  
Sep 11 00:23:02 prescott kernel: Pid: 17621, comm: X Tainted: P     U  
2.6.13-10-xen  
Sep 11 00:23:02 prescott kernel: RIP: e030:[<ffffffff884b0ed4>]  
<ffffffff884b0ed4>{:nvidia:_nv002491rm+0}  
Sep 11 00:23:02 prescott kernel: RSP: e02b:ffff880028d11bc0  EFLAGS: 00010202  
Sep 11 00:23:02 prescott kernel: RAX: 0000000000000000 RBX: ffff880028d11be8  
RCX: 00000000bfebfbff  
Sep 11 00:23:02 prescott kernel: RDX: 0000000000000000 RSI: 0000000000000001  
RDI: ffff880030712000  
Sep 11 00:23:02 prescott kernel: RBP: ffffffff888b4480 R08: ffff880028d11bdc  
R09: ffff880028d11bd8  
Sep 11 00:23:02 prescott kernel: R10: ffff880028d11be8 R11: 000000000000001c  
R12: ffff880030712000  
Sep 11 00:23:02 prescott kernel: R13: ffff880028670000 R14: ffffc2001107b000  
R15: ffff88002ba7a800  
Sep 11 00:23:02 prescott kernel: FS:  00002aaaab35f0a0(0000)  
GS:ffffffff804bbc80(0000) knlGS:0000000000000000  
Sep 11 00:23:02 prescott kernel: CS:  e033 DS: 0000 ES: 0000   
Sep 11 00:23:02 prescott kernel: Process X (pid: 17621, threadinfo  
ffff880028d10000, task ffff880038efa880)  
Sep 11 00:23:02 prescott kernel: Stack: ffffffff88497b5c ffff880028d11be0  
ffffffff885d0e6a 000208000000651d  
Sep 11 00:23:02 prescott kernel:        00000f41bfebfbff 49656e69756e6547  
bfebfbff6c65746e 0001040f00000000  
Sep 11 00:23:02 prescott kernel:        ffff880000000000 ffff88002b87b000  
Sep 11 00:23:02 prescott kernel: Call  
Trace:<ffffffff88497b5c>{:nvidia:_nv001456rm+376}  
<ffffffff885d0e6a>{:nvidia:_nv004524rm+48}  
Sep 11 00:23:02 prescott kernel:         
<ffffffff884aac6c>{:nvidia:_nv003623rm+116}  
<ffffffff8861e78c>{:nvidia:_nv003247rm+126}   
Sep 11 00:23:02 prescott kernel:         
<ffffffff885d1920>{:nvidia:_nv004556rm+68}  
<ffffffff885d16fe>{:nvidia:_nv004385rm+104}  
Sep 11 00:23:02 prescott kernel:         
<ffffffff884aaad4>{:nvidia:_nv001453rm+96}  
<ffffffff88582308>{:nvidia:_nv000393rm+20}  
Sep 11 00:23:02 prescott kernel:        
<ffffffff88582483>{:nvidia:_nv000397rm+125} 
<ffffffff884ad921>{:nvidia:_nv001426rm+141} 
Sep 11 00:23:02 prescott kernel:        
<ffffffff884ab512>{:nvidia:_nv001458rm+668} 
<ffffffff884ae8c4>{:nvidia:rm_init_adapter+104} 
Sep 11 00:23:02 prescott kernel:        
<ffffffff886a40b7>{:nvidia:nv_kern_open+581} 
<ffffffff80183553>{chrdev_open+307} 
Sep 11 00:23:02 prescott kernel:        <ffffffff80179df6>{dentry_open+246} 
<ffffffff80179f64>{filp_open+68} 
Sep 11 00:23:02 prescott kernel:        <ffffffff801791fa>{get_unused_fd+90} 
<ffffffff8017a002>{sys_open+82} 
Sep 11 00:23:02 prescott kernel:        <ffffffff80111a9d>{system_call+117} 
<ffffffff80111a28>{system_call+0} 
Sep 11 00:23:02 prescott kernel: 
Sep 11 00:23:02 prescott kernel: 
Sep 11 00:23:02 prescott kernel: Code: 0f 20 e0 c3 0f 20 d8 c3 53 48 89 cf 89 
f0 89 d1 0f a2 89 07 
Sep 11 00:23:02 prescott kernel: RIP <ffffffff884b0ed4>{:nvidia:_nv002491rm+0} 
RSP <ffff880028d11bc0> 
Sep 11 00:23:02 prescott kdm: :0[17622]: IO Error in XOpenDisplay 
Sep 11 00:23:02 prescott kdm[17614]: Display :0 cannot be opened 
Sep 11 00:23:02 prescott kdm[17614]: Unable to fire up local display :0; 
disabling. 
 
The machine survives, but you lose your console until the next reboot. 
Comment 12 Stefan Dirsch 2005-09-12 07:52:33 UTC
Ok. Let's reopen.
Comment 13 Kurt Garloff 2005-09-12 08:12:19 UTC
Created attachment 49567 [details]
nv-fix-gartaddr-xen.diff

Suggested patch, will disable pat support on Xen by default and will make the
driver load fail under
Xen unless overriden.
Comment 14 Stefan Dirsch 2005-09-12 08:55:27 UTC
Should be safe to use the new patch.
Comment 15 Stefan Dirsch 2005-09-12 09:07:02 UTC
Created attachment 49575 [details]
Regenerated consecutive patch

I'll take this one.
Comment 16 Stefan Dirsch 2005-09-12 09:11:15 UTC
New package submitted.
Comment 17 Stefan Dirsch 2005-09-12 14:07:34 UTC
Ok. I think it's time to hear an comment by NVIDIA. Andy, is Xen support sth.
NVIDIA is focused on?
Comment 18 andy ritger 2005-09-12 16:40:27 UTC
Sorry for the slow response.  I've been soliciting review from some of our
kernel engineers at NVIDIA.  I should have more information to post soon.

Any real technical issues aside, I'd be concerned about disabling PAT -- how
does Xen handle per-page cache attributes?

At this point, NVIDIA has no plans to support Xen.
Comment 19 Stefan Dirsch 2005-09-18 15:14:43 UTC
Setting to enhancement. 
Comment 20 Stefan Dirsch 2005-12-06 10:52:46 UTC
The last two hunks in nv-fix-gartaddr-xen.diff no longer apply with 1.0-8174, since nv_sg_map_buffer() and nv_sg_load() have moved to nv-vm.c and have changed. Since I'm not familiar with this patch and I don't want to break the driver I'll disable this patch for now.
Comment 21 Stefan Dirsch 2005-12-06 10:55:14 UTC
Of course the consecutive patch no longer works as well. I'll disable it for now.
Comment 22 Stefan Dirsch 2005-12-06 10:58:36 UTC
Kurt, in case you want to look at the patches, please use the nvidia-gfx-1_0_7676 package. I'll submit it ASAP and let you know about.
Comment 23 Marius Tomaschewski 2006-02-24 18:09:28 UTC
JFYI: There are some efforts to get the nvidia drivers working
      with xen in the nvidia linux discussion forum:

http://www.nvnews.net/vbulletin/showthread.php?t=65198
http://www.nvnews.net/vbulletin/showthread.php?t=60125
Comment 26 Stefan Dirsch 2006-08-30 22:13:03 UTC
Lonni, Andy. 

Are there really no plans to support Xen in one of the next releases? More and more people begin to use it, also on their desktop machines ...
Comment 28 Stefan Dirsch 2007-05-15 06:11:21 UTC
*** Bug 274597 has been marked as a duplicate of this bug. ***
Comment 29 Stefan Dirsch 2007-09-04 20:21:31 UTC
*** Bug 307510 has been marked as a duplicate of this bug. ***
Comment 30 Marius Tomaschewski 2007-09-21 15:28:32 UTC
I found interesting statement at:

http://www.nvnews.net/vbulletin/showthread.php?t=95483

zander, NVIDIA Corporation wrote:
---
No, this patch won't be included in future driver releases. Please note that although doing so is unsupported, 100.14.11 can be built against RHEL5's Xen kernels without patches if the IGNORE_XEN_PRESENCE environment variable is set to a non-zero value (you may also need to create an include2 directory in the top-level directory of the kernel development files and place an asm symlink to /usr/src/kernels/2.6.18-8.el5-xen-i686/include/asm-i386 in it). Mileage with other Xen kernels will vary. I do not believe the Xen patches posted for 100.14.11 are correct. I hope to take a look at providing one for Fedora Core 7, etc., at some point in the future.
---
Comment 31 Stefan Dirsch 2007-10-06 11:27:08 UTC
Comment #30 sounds interesting to me. I'm not sure what this means for SLES/openSUSE kernels though.
Comment 32 Stefan Dirsch 2008-01-12 20:09:00 UTC
*** Bug 353513 has been marked as a duplicate of this bug. ***
Comment 33 Stefan Dirsch 2008-05-28 14:32:46 UTC
I finally decided to no longer track proprietary NVIDIA driver bugs
against openSUSE. Therefore I'm closing these now as WONTFIX.

In case you're using our SLES/SLED products and can reproduce this
issue also on thesed products feel free to reopen. These are still
tracked, since customers of these products depend on the proprietary
driver for newer NVIDIA hardware.

Be aware that you need a privilleged account to track anything against
our SLES/SLED products. So if this not an option for you I suggest to
report the problem to the official NVIDIA driver feedback channels
(forum/email; see NVIDIA driver download site) and refer to this
bugreport.