Bugzilla – Bug 146136
LTC21180-Unable to initialize qeth/dasd devices with CONFIG_DEBUG_SLAB enabled
Last modified: 2006-06-06 15:29:50 UTC
When building the kernel with CONFIG_DEBUG_SLAB the dasd and qeth devices fail to come up. AFAIR there have been some alignment issues with CONFIG_DEBUG_SLAB on s390. I don't know if this is really a qeth/dasd problem or a generic problem with buffers and IO. Frank, I still assign that one to you since we already discussed that issue in bug 144973 (LTC20908- qeth: Kernel oops (NULL pointer dereference)). The problem with the dasds is that sometimes they are not detected at all. This depends from rebuild to rebuild. The problem with qeth is as follows, when setting the device online: s390vm01:/sys/bus/ccwgroup/drivers/qeth/0.0.0700 # echo 1 > online echo 1 > online qdio : received check condition on establish queues on irq 0.0.4 (cs=x20, ds=xc). qdio : received check condition on activate queues on device 0.0.0702 (cs=x20, ds=xe). qeth: Recovery of device 0.0.0700 started ... qeth: Device 0.0.0700 could not be recovered! qeth: sense data available on channel 0.0.0700. qeth: cstat 0x0 dstat 0xE qeth: irb: 00 c2 60 17 0c ba 10 48 0e 00 10 00 00 80 00 00 qeth: irb: 01 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 qeth: sense data: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 qeth: sense data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 qeth: Recovery of device 0.0.0700 started ... qdio : got interrupt for queues in state 3 on device 0.0.0702?! qdio : got interrupt for queues in state 3 on device 0.0.0702?! qeth: Initialization in hardsetup failed! rc=-5 qeth: Retrying to do IDX activates. qdio : got interrupt for queues in state 3 on device 0.0.0702?! s390vm01:/sys/bus/ccwgroup/drivers/qeth/0.0.0700 # qeth: Retrying to do IDX activates. qdio : got interrupt for queues in state 3 on device 0.0.0702?! qeth: Retrying to do IDX activates. qdio : got interrupt for queues in state 3 on device 0.0.0702?! qeth: Initialization in hardsetup failed! rc=-62 qeth: Device 0.0.0700 could not be recovered!
changed: What |Removed |Added ---------------------------------------------------------------------------- Owner|gjlynx@us.ibm.com |h.carstens@de.ibm.com ------- Additional Comments From pavlic@de.ibm.com 2006-01-31 10:48 EDT ------- Reassigning this bugzilla to Heiko ... Frank
changed: What |Removed |Added ---------------------------------------------------------------------------- Owner|h.carstens@de.ibm.com |pavlic@de.ibm.com Severity|normal |low ------- Additional Comments From h.carstens@de.ibm.com(prefers email via heiko.carstens@de.ibm.com) 2006-02-14 14:09 EDT ------- Without deeper knowledge of QDIO I\'m not able to debug this. Frank, why are these check conditions generated?
changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cborntra@de.ibm.com ------- Additional Comments From cborntra@de.ibm.com 2006-04-10 07:55 EDT ------- I have no clue about the dasds (you dont mean fcp devices?) but it seems that I have found the alignment problem in qdio.c. The qib structure must be aligned to 256 Bytes AND is enmbedded into the qeth_irq structure. The alignment cannot be guaranteed with slab debugging. If you force the qeth_irq structure to a page boundary qeth works for me with CONFIG_DEBUG_SLAB. See the patch below (the whitespaces are broken due to cut and paste into this bugzilla) diff -u -p -r1.3 qdio.c --- drivers/s390/cio/qdio.c 4 Apr 2006 07:25:26 -0000 1.3 +++ drivers/s390/cio/qdio.c 10 Apr 2006 11:51:54 -0000 @@ -1637,7 +1637,7 @@ next: } kfree(irq_ptr->qdr); - kfree(irq_ptr); + free_page((unsigned long) irq_ptr); } static void @@ -2984,7 +2984,7 @@ qdio_allocate(struct qdio_initialize *in qdio_allocate_do_dbf(init_data); /* create irq */ - irq_ptr=kmalloc(sizeof(struct qdio_irq), GFP_KERNEL | GFP_DMA); + irq_ptr=(void *) get_zeroed_page(GFP_KERNEL | GFP_DMA); QDIO_DBF_TEXT0(0,setup,\"irq_ptr:\"); QDIO_DBF_HEX0(0,setup,&irq_ptr,sizeof(void*)); @@ -2994,14 +2994,13 @@ qdio_allocate(struct qdio_initialize *in return -ENOMEM; } - memset(irq_ptr,0,sizeof(struct qdio_irq)); init_MUTEX(&irq_ptr->setting_up_sema); /* QDR must be in DMA area since CCW data address is only 32 bit */ irq_ptr->qdr=kmalloc(sizeof(struct qdr), GFP_KERNEL | GFP_DMA); if (!(irq_ptr->qdr)) { - kfree(irq_ptr); + free_page((unsigned long) irq_ptr); QDIO_PRINT_ERR(\"kmalloc of irq_ptr->qdr failed! \"); return -ENOMEM; } Let me knwo if this patch works.
Created attachment 77517 [details] slab2616.diff
changed: What |Removed |Added ---------------------------------------------------------------------------- Owner|pavlic@de.ibm.com |cborntra@de.ibm.com ------- Additional Comments From cborntra@de.ibm.com 2006-04-10 07:58 EDT ------- patch against 2.6.16 which makes qeth work CONFIG_SLAB_DEBUG please test this patch and let me know if it works. we can then mak an official patch.
Hello Jan, I am assigning this bugzilla back to you ... ... can you please test the attached patch whether it resolves the problem in this bugzilla ..? Thanks in advance for your support.
Created attachment 80349 [details] x3270 log Thanks for the patch. I have good and bad news. The QDIO issue seems to be fixed. But the SLAB debugger found the following bug in our latest kernel: Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 Oops: 0038 [#1] CPU: 0 Not tainted Process ifup (pid: 1211, task: 00000000011a2150, ksp: 000000000d56bd88) Krnl PSW : 0704200180000000 0000000010a983f6 (qeth_hard_start_xmit+0x1dda/0x2218 [qeth]) Krnl GPRS: 0000000000000006 6b6b6b6b6b6b6ba5 0000000000000001 0000000000000000 0000000010a97c94 0000000000000001 000000000f8a8000 000000000f8a9c10 0000000000e57000 000000000f8a0000 0000000000000000 000000000f5b9bd0 0000000010a8e000 0000000010abb948 0000000010a97c94 00000000012b1b20 Krnl Code: bf 43 10 06 a7 84 00 13 58 50 f0 f0 12 55 a7 84 00 0e 58 10 Call Trace: ([<0000000010a97c94>] qeth_hard_start_xmit+0x1678/0x2218 [qeth]) [<00000000003907fa>] qdisc_restart+0x13e/0x280 [<0000000000375376>] dev_queue_xmit+0x496/0x718 [<0000000010a443f8>] mld_sendpack+0x32c/0x4ec [ipv6] [<0000000010a49196>] mld_ifc_timer_expire+0x316/0x3c0 [ipv6] [<00000000001549f4>] run_timer_softirq+0x660/0x704 [<0000000000149950>] __do_softirq+0x6c/0x108 [<000000000010f226>] do_softirq+0xba/0xf4 [<0000000000110034>] ext_no_vtime+0x16/0x1a [<00000000001c01ae>] do_wp_page+0x10e/0x4e0 ([<00000000001c0184>] do_wp_page+0xe4/0x4e0) [<00000000001c753c>] __handle_mm_fault+0xcc4/0xdcc [<0000000000101a98>] do_protection_exception+0x1c0/0x450 [<000000000010f95a>] sysc_return+0x0/0x10 [<0000020000180138>] 0x20000180138 <0>Kernel panic - not syncing: Fatal exception in interrupt 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00. 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 00103A44
First look at the problem: This seems to be in qeth_send_packet() (line 4506 in qeth_main.c): rc = qeth_do_send_packet_fast(card, queue, skb, hdr, elements_needed, ctx); if (!rc){ card->stats.tx_packets++; card->stats.tx_bytes += tx_bytes; #ifdef CONFIG_QETH_PERF_STATS if (skb_shinfo(skb)->tso_size && <======= here !(large_send == QETH_LARGE_SEND_NO)) { card->perf_stats.large_send_bytes += skb->len; card->perf_stats.large_send_cnt++; } if (skb_shinfo(skb)->nr_frags > 0){ card->perf_stats.sg_skbs_sent++; /* nr_frags + skb->data */ card->perf_stats.sg_frags_sent += skb_shinfo(skb)->nr_frags + 1; } #endif /* CONFIG_QETH_PERF_STATS */ } if (ctx != NULL) { /* drop creator's reference */ qeth_eddp_put_context(ctx); I looked into all the skb handling in qeth_do_send_packet(_fast) but I don't see why the shinfo is already freed. Frank, can you take a look?
Christian, thanks for the fix. The problem with CONFIG_DEBUG_SLAB is fixed by your patch. Second problem is fixed by IBM Codestream linux-2.6.16 october2005 patch 02-19, thanks to Frank. Both patches in CVS.
----- Additional Comments From cborntra@de.ibm.com 2006-05-08 06:57 EDT ------- Yes, I have found the same probe in qeth with slab debugging. qeth should work now with slab debugging. The slab debugging fix is now upstream as well.
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ACCEPTED |CLOSED Impact|------ |RAS ------- Additional Comments From cborntra@de.ibm.com 2006-05-16 11:27 EDT ------- slab debugging fix is in SLES10 RC1.
Closed.