Bug 105178

Summary: XEN doesn't work at all
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Ladislav Slezák <lslezak>
Component: KernelAssignee: Kurt Garloff <garloff>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: aj, jbeulich, jlp, mt, vonhagen
Version: Beta 2   
Target Milestone: ---   
Hardware: Other   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: /usr/sbin/xend strace while "xm create"
Ugly hack for adding vifX.Y to the bridge...

Description Ladislav Slezák 2005-08-17 13:15:54 UTC
When I try to boot XEN in Beta2 I get only blank screen, the system is frozen.
Comment 1 Marcel Buchholz 2005-08-18 12:17:46 UTC
*** Bug 105407 has been marked as a duplicate of this bug. ***
Comment 2 Kurt Garloff 2005-08-19 18:14:37 UTC
Yeah, unfortunately. 
Clyde, Jan, Charles, KY are working on it. 
I expect we'll have something that works in beta3. 
Comment 3 Kurt Garloff 2005-08-21 11:32:15 UTC
OK, we now have packages that should work. 
Mostly thanks to the hard work of Jan. 
Please have a look at 
http://www.suse.de/~garloff/linux/xen/RPMs-100/ 
Comment 4 Larry Nguyen 2005-08-21 13:55:41 UTC
what about for amd64 system? I understand from FAQ that it's work in progress. 
But I saw x64 rpm from  the above URL. I dll and  when booting up with 
xen-i586 kernel,  it complainns Dom0 is not valid Elf 
 
I guess that makes  sense since I use  i586-kernel  and  xen-x64?  
Comment 5 Kurt Garloff 2005-08-22 15:20:35 UTC
Without hardware virtualization support (VT/Pacifica), you need to use a 
64bit paravirtualized kernel on 64bit xen. 
Comment 6 Kurt Garloff 2005-08-24 13:32:56 UTC
Unfortunately, xen still does not work. 
It dies here with a kernel paging request in copy_pte_range() 
after the first fork in initrd :-( 
Comment 7 Ladislav Slezák 2005-08-24 14:59:59 UTC
I have the same experience.
Comment 8 Kurt Garloff 2005-08-24 15:54:56 UTC
I have a bit more information, if this helps someone to debug. 
 
The page fault occurs when writing to dst_pte in copy_one_pte. 
This happens on the second invocation of copy_page_range(). 
 
The first one deals with 0x1875 pages 
VM_(READ|EXECUTE|MAYREAD|MAYWRITE|MAYEXECUTE|DENYWRITE|EXECUTABLE) 
the second one wiht 0x101877 pages 
as above |VM_WRITE|VM_CCOUNT 
 
The last pte written to the pmd by the first copy_page_range() in my 
debug run was 0xc0b86170 (contents 0x0805c000), then first one in the 
second copy_page_range would be 0xc0b86174 (contents 0x0805d000) which 
faults. 
 
This looks strange to me. If 0xc0b86170 is writable, so must be 
0xc0b86174. Unless someone change the permissions on the pmd or unmapped 
it in between. I failed to spot where this happens, though :-( 
Comment 9 Andreas Jaeger 2005-08-25 05:01:32 UTC
So, this means that Xen is still broken in Beta3?  Could you give me a summary
what works - and what does not?
Comment 10 Ladislav Slezák 2005-08-25 09:06:55 UTC
I just have tested XEN in Beta3 with the same result - it's still broken.
Comment 11 Kurt Garloff 2005-08-25 12:57:45 UTC
But JanB found the bug in skas-for-arch-xen3 ... 
Have a look at 
http://www.suse.de/~garloff/linux/xen/RPMs-100/ 
The 6315 packages do boot on 32bit at least according to Jan. 
I'll test the 6393 packages RSN. 
Comment 12 Andreas Jaeger 2005-08-26 06:34:07 UTC
Kurt, do these work?  If they do, let's do an online update so that we test
update as well...
Comment 13 Ladislav Slezák 2005-08-26 07:51:00 UTC
Although there is a progress it's still not perfect.

I did a quick test yesterday - I was able to start domain0 with two major
problems: setting up the network failed (could not get IP via DHCP) and 'rcxend
start' printed an error message repeatedly and failed to start.
Comment 14 Marcel Ritter 2005-08-26 11:29:57 UTC
Same problem here (with version 6393) - no IP via DHCP, and it's impossible to
change memory usage (by "xm mem-set Domain-0 512") or to start a new (virtual)
machine with "xm create". Both result in an error message:

xen.xend.XendProtocol.XendError: Internal server error
or
Error: Internal server error

Btw: The YaST module does not set "kernel" and "root" when creating a virtual
machine. But I'm not quite sure wether this is really necessary.
Comment 15 Ladislav Slezák 2005-08-26 11:55:43 UTC
The yast module sets root after reboot of the VM (after the first installation
stage). Root device is not known at the first start.

Kernel should be set up correctly (at least I have no problem).
Comment 16 William von Hagen 2005-08-29 01:23:06 UTC
I am seeing the same problems as Marcel Ritter reported after installing the
RPMs  from 6393 - "Internal server error" when trying xm mem-set or xm create.
I'm trying this with a vm.conf file that I'd used successfully under 9.3. I'm
also seeing DHCP problems on the bridge device on the host machine. I just
thought I'd confirm that these still seem to be problems in the 6393 packages. 
It would also be nice if the YaST Virutal config module supported VM memory
sizes other than 256  or 512 - those seem to be the only seltable values, rather
than allowing you to specify a VM memory setting.
Comment 17 Ladislav Slezák 2005-08-29 07:02:29 UTC
The Yast module supports any memory size - the value is editable. There is no
limit, enter whatever you want. 256 and 512 MB are just two predefined values
for easier setup.

The disk image configuration dialog has the same behaviour.
Comment 18 Marius Tomaschewski 2005-08-29 12:46:14 UTC
I'm using Kurts RPMs 6393 and have the same problem as Marcel and William.

The "xm create" (config created with yast2 module) reports internal error:

Xanthos:~ # xm create -c /etc/xen/domain1 vmid=1
Using config file "/etc/xen/domain1".
Error: Internal server error

See also strace output of the server that I will attach.

BTW: A "xm dmesg" shows:
(XEN) (file=/usr/src/packages/BUILD/xen-unstable/xen/include/asm/mm.h, line=201)
Error pfn 100: rd=ff1a6080, od=00000000, caf=00000000, taf=00000000
Comment 19 Marius Tomaschewski 2005-08-29 12:47:17 UTC
Created attachment 47987 [details]
/usr/sbin/xend strace while "xm create"
Comment 20 Marius Tomaschewski 2005-08-29 13:36:21 UTC
After a hack of /usr/lib/python2.4/site-packages/xen/web/SrvDir.py:

     def render_POST(self, req):
-        if self.isAuthorized(req):
-            return self.perform(req)
-        else:
-            return self.unauthPage(req, "You need admin power.")
+       return self.perform(req)
+        #if self.isAuthorized(req):
+        #else:
+        #    return self.unauthPage(req, "You need admin power.")

I was able to lower the memory in Domain-0 (xm mem-set 0 890) and
to create the domain1:

Xanthos:~ # xm list
Name              Id  Mem(MB)  CPU VCPU(s)  State  Time(s)
Domain-0           0      890    0      1   r----    461.8
Domain-1           2      256    3      1   r----    171.4

the kernel boots, but stops exactly as the kernel-xen from Beta3
while init is started.
I'll retry with kernel from Kurts 6393 RPMs...
Comment 21 Kurt Garloff 2005-08-29 14:06:19 UTC
The problem with mem-set is caused by my additional security checks. 
Those are not applicable if the connection is done via a socket rather 
than TCP. So I'll drop it. 
Comment 22 Marius Tomaschewski 2005-08-29 17:00:12 UTC
I've tryed to boot a system installed via "dir-installer" in a file
image as well as on LVM volume with the "file:" and "phy:" VBDs using
the kernel from 6393 RPMs in domain0 as well as in domain1.

I've tryed many different combinations for the "disk" and "root" vars,
but it allways fails to boot.

e.g. if I set:

disk = [ 'phy:/dev/xenvg1/domain1,hda1,w' ]
root = "/dev/hda1"

it fails with:

Waiting for device /dev/hda1 to appear:  ok
rootfs:  major=3 minor=1 devn=769
Mounting root /dev/hda1
mount: No such device
Kernel panic - not syncing: Attempted to kill init!

If I set the root = "", it fails with:

/sda1: error open volume
=: not found
Mounting root /sda1
Usage: mount [-r] [-w] [-o options] [-t type] device directory
Kernel panic - not syncing: Attempted to kill init!

I've tryed with "hda", "sda", "hda1", "sda1", only relative path to
the lvm device, empty root variable, ...

What I am doing wrong - what's the right setting for disk and root
vars in xen3 (6393 RPMs) ?
Comment 23 Kurt Garloff 2005-08-29 17:41:20 UTC
Updated packages, with working set-mem, are available on   
http://www.suse.de/~garloff/linux/xen/RPMs-100/6458/   
   
Marius, the phy path is relative to /dev/, so I would write 
phy:xenvg1/domain1. I fail to know whether absolute pathnames 
work as well. 
Comment 24 Kurt Garloff 2005-08-29 18:12:25 UTC
Xen mostly works for me on 32bit, downgrading to major. 
Comment 25 Marius Tomaschewski 2005-08-30 09:44:00 UTC
(In reply to comment #23)
> Marius, the phy path is relative to /dev/, so I would write 
> phy:xenvg1/domain1. I fail to know whether absolute pathnames 
> work as well. 

Yes, I've used relative paths, but since it hasn't worked, I've
also tryed out absolute paths as well.

After update to your 6458 RPMs it seems to work fine - now I've
a running VM booted from 'phy:xenvg1/domain1,hda1,w'.

Thanks!

Hmm... I've updated the initrd while the update... perhaps it was
simply broken...?
Comment 26 Kurt Garloff 2005-08-30 22:38:38 UTC
Good! 
Comment 27 Marius Tomaschewski 2005-08-31 15:15:59 UTC
There are several problems with network scripts in /etc/xen/scripts.

If I start the system with xend in domain0, the bridge is created
and network works fine (statically configured IP in domain0).

After starting a vm in domain1 the networking inside of the vm
doesn't work. I've to invoke the scripts/vif-bridge script with
the mac address I've configured in the domain1 config manually
(add vif1.0 to the bridge) to get it working.

Further, there are colisions between the xen network scripts and
the scripts used by the system. For example, the system network
scripts does not like the "xen-br0" name - they strip the "xen-"
from it.

After a "rcnetwork restart", both domain0 and domain1 aren't
working any more.

Further, some python exception happens:

Traceback (most recent call last):
  File "/usr/lib/python2.4/logging/handlers.py", line 62, in emit
    if self.shouldRollover(record):
  File "/usr/lib/python2.4/logging/handlers.py", line 132, in shouldRollover
    self.stream.seek(0, 2)  #due to non-posix-compliant Windows feature
ValueError: I/O operation on closed file

I'll install Beta4 and go on with testing.
Comment 28 Kurt Garloff 2005-08-31 23:44:07 UTC
Charles has done a fix to the network-bridge script, which is part of b4. 
And maybe we should change the default name for the bridge to xenbr0 
to work around the bugs^W limitations of the SUSE network scripts. 
This can be done in /etc/xen/xend.sxp. 
Please let me know whether this helps. 
Comment 29 Charles Coffing 2005-09-01 12:36:51 UTC
FWIW, I've seen all these issues too. 
 
Hopefully Kurt's suggestion will help with networking; when I get a minute I'll 
track down the python exceptions. 
 
When I've set this up, I've never gotten networking going within the domU.  
domU actually sees "vif1.0" and such, but not "eth0".  Anyone else seeing this 
behavior? 
 
Comment 30 Murlin Wenzel 2005-09-01 21:42:08 UTC
Clyde asked me to report what I am seeing with XEN.  So far just doing a simple
install with dom0, the 32bit version appears to be working ok.  The x86_64
version will install, boot the xen kernel, then within 2-3 minutes just hard
lock the system.  I've also seen during the install that the entry for xen in
grub has allocated all the system memory to dom0.  Without lowering the default
xen will not even start.  You just get a panic/reboot due to lack of free
memory.  This is all based on Beta4.
Comment 31 Bryan Perry 2005-09-02 15:00:08 UTC
I must be doing something wrong then. I have a default install of Beta 4 on a
Toshiba Tecra 9000, and I have not been able to get the machine to boot using
the xen grub entry. The machine gets most of the way through the boot process to
where you see the login prompt on the CLI just before gdm takes over, but then
the screen goes blank and I can not do anything else. I can't switch consoles or
anything.
Comment 32 Kurt Garloff 2005-09-05 21:39:43 UTC
Bryan, you may have one of the graphics cards that don't work in dom0 :-(  
If you boot into runlevel 3, your system will hopefully work stable.  
  
-- 
 
Charles, here's what to do for networking in domU, tested with 6610 from 
http://www.suse.de/~garloff/linux/xen/RPMs-100/6610/ 
# Config file 
nics=1 
# optional vif = [ 'mac=aa:cc:10:00:00:93, bridge=xenbr0' ] 
# The mac value is arbitrary of course, you don't need to set it, but then 
# you'll get a random value, which may not be what your dhcp setup prefers 
 
After starting the domain, you need to 
ip link set dev vifX.0 up 
brctl addif xenbr0 vifX.0 
 
The connectivity on domU is via the device called eth0. 
Comment 33 Murlin Wenzel 2005-09-07 17:19:07 UTC
Now we're getting somewhere.  I updated to Kurt's 6610 xen build and my system
is staying up now.  I'm mainly running any stress tests I can think of against
dom0.  So far, bonnie does pretty well.
Comment 34 Charles Coffing 2005-09-08 19:34:22 UTC
x86 Linux crashes on startup now (xen 3.0_6004 / Linux 2.6.13-9-xen).  This 
seems to be related to the fact that SMP was just enabled for linux-xen. 
Comment 35 Marius Tomaschewski 2005-09-09 12:41:15 UTC
The kernel from 6644up works for me again.
Comment 36 Marius Tomaschewski 2005-09-09 12:42:54 UTC
Created attachment 49376 [details]
Ugly hack for adding vifX.Y to the bridge...
Comment 37 Marius Tomaschewski 2005-09-09 12:51:13 UTC
But it works for 2 vm's and since the vif's are removed from
the bridge and destroyed while "xm shutdown", ...
Comment 38 Ladislav Slezák 2005-09-13 10:32:06 UTC
I still have a problem with the network in RC2 - I cannot get the network
running properly.

# ip a
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: vif0.0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
    link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fcff:ffff:feff:ffff/64 scope link
       valid_lft forever preferred_lft forever
3: eth0: <BROADCAST,MULTICAST,NOTRAILERS,UP> mtu 1500 qdisc noqueue
    link/ether 00:00:1c:b5:5d:5d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::200:1cff:feb5:5d5d/64 scope link
       valid_lft forever preferred_lft forever
4: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0
5: peth0: <BROADCAST,MULTICAST,NOARP,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fcff:ffff:feff:ffff/64 scope link
       valid_lft forever preferred_lft forever
6: xenbr0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
    link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
    inet6 fe80::200:ff:fe00:0/64 scope link
       valid_lft forever preferred_lft forever

# brctl show
bridge name     bridge id               STP enabled     interfaces
xenbr0          8000.feffffffffff       no              vif0.0
                                                        peth0

I have selected DHCP on eth0 interface in standard kernel during installation,
later after reboot into XEN the network doesn't work at all. How can I setup the
network correctly?
Comment 39 Marius Tomaschewski 2005-09-13 11:16:59 UTC
There were some problems using DHCP on the interface (eth0 by default),
that is added to the bridge. I don't know if this has changed on RC2.

In my setups I've configured eth0 to use static IPs (and eth1 to use
DHCP with DHCLIENT_PRIMARY_DEVICE='yes'). Then, it worked fine for me.
Comment 40 Ladislav Slezák 2005-09-13 14:19:40 UTC
Static IP configuration doesn't work for me. I have no idea what could be wrong,
I'm still unable to get the network running in domain0...
Comment 41 Marius Tomaschewski 2005-09-14 07:53:14 UTC
Disable xen backend (insserv -r), reboot and make sure that your network
configuration is working after the reboot. There should be 2 interfaces
eth0 (physical) and veth0 (virtual) after the reboot (without xend started).
You have to configure the eth0 interface using static IP and make sure it
works. You can rename the config file "ifcfg-eth-id-..." to "ifcfg-eth0"
but it should work without a rename as well.

As soon as you start xend, the interfaces will be reconfigured: xenbr0 will
be created, eth0 will be renamed to peth0 (and added to the bridge), veth0
to eth0. The interface eth0 (now virtual) should have the MAC and IP address
that you have configured for the physical eth0 before and network should work
again.

Do not restart the network using "rcnetwork restart" any more, it may destroy
the configuration.
Comment 42 Ladislav Slezák 2005-09-22 09:40:33 UTC
Thanks! I still had problems with the network, it didn't work at all. But I
found that I need 'acpi=off' option, than it works properly (see bug #116485).
Comment 43 Kurt Garloff 2005-12-20 20:47:48 UTC
Can you test again with the updates provided for SUSE Linux 10.0?
The network setup scripts have been improved significantly.
Comment 44 Lynn Bendixsen 2006-01-12 15:47:59 UTC
Please check the latest updates if you can, and let us know if this is still a bug.  Changing to needinfo.
Comment 45 Lynn Bendixsen 2006-01-26 16:56:36 UTC
Since the last comment from the submitter indicates that a different bug addresses the ongoing issue and the original issue has long since been fixed, I think it is appropriate to mark this bug resolved.