Bug 1198753 - Since kernel 5.17 smb client does not work
Since kernel 5.17 smb client does not work
Status: CONFIRMED
: 1202463 (view as bug list)
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 openSUSE Tumbleweed
: P5 - None : Major (vote)
: ---
Assigned To: Enzo Matsumiya
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-04-22 09:21 UTC by Sergio Lindo
Modified: 2022-08-19 20:53 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
ematsumiya: needinfo? (sergiolindo.empresa)


Attachments
linux-5.17-samba-cifs.journal.log (80.72 KB, text/x-log)
2022-05-02 07:46 UTC, Sergio Lindo
Details
linux-5.17.4-samba-cifs.nodfs.journal.log (23.61 KB, text/x-log)
2022-05-09 07:54 UTC, Sergio Lindo
Details
linux-5.17.4-samba-cifs.nodfs.journal.log (23.62 KB, text/x-log)
2022-05-09 07:56 UTC, Sergio Lindo
Details
linux-5.16.15-samba-cifs.journal.log (38.19 KB, text/x-log)
2022-05-09 07:58 UTC, Sergio Lindo
Details
Packet traces (17.32 MB, application/x-zip-compressed)
2022-08-17 20:38 UTC, Ben Walter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sergio Lindo 2022-04-22 09:21:22 UTC
# Devices

Samba servers comes from Synology NAS drives.
Model: Synology DS218
OS: DSM 7.0.1-42218 Update 3

Clients are Windows 10, MacOS Monterrey, openSUSE Leap 15.2 and openSUSE Tumbleweed 20220420

Windows 10 and MacOS and openSUSE Leap 15.3 clients can access the shared SAMBA folder.

openSUSE Tumbleweed since kernel 5.17 fails to access SAMBA shares.


# Gnome - Nautilus

Typing in the location bar of Gnome/Nautilus just shows

```
Opps! Something went wrong.

Unhandled error message: Failed to retrieve share list from server: Invalid argument
```

There is no message in systemd's journal nor in dmesg.


# autofs

When trying to ls a directory managed by autofs

systemd's journal and dmesg
kernel: CIFS: Attempting to mount \\archives.urbangames.local\public
kernel: CIFS: VFS: cifs_mount failed w/return code = -40
Comment 1 Sergio Lindo 2022-04-22 09:22:38 UTC
Until kernel 5.16 was purged, I was booting it and I could still use both nautilus and autofs.

Now the kernel 5.16 is purged and not available from the official repo.
Comment 2 Takashi Iwai 2022-04-22 12:40:59 UTC
You can still find the old kernel package in TW history repo,
  http://download.opensuse.org/history/

So please confirm that it's working with the latest 5.16.x.
Comment 3 Sergio Lindo 2022-04-22 13:45:13 UTC
Thanks Takashi,

I have found that the last Tumbleweed snapshot that provides a 5.16 version was 20220403.

But, when I add  the URL http://download.opensuse.org/history/20220403/tumbleweed/repo/oss/ as a repo, zyppper/yast-sw_single cannot find any kernel below 5.17.

Is that a correct repo URL? Do I need to do something extra?
Comment 4 Sergio Lindo 2022-04-22 13:56:13 UTC
Well, I have installed it manually and booted from it, and I can confirm it works again.
Comment 5 Sergio Lindo 2022-04-22 13:58:36 UTC
So, version kernel-default-5.16.15-1.1
Comment 6 Takashi Iwai 2022-04-22 14:05:01 UTC
(In reply to Sergio Lindo from comment #4)
> Well, I have installed it manually and booted from it, and I can confirm it
> works again.

Yes, manually downloading the package and installing it is the easiest way for the kernel :)

Assuming it being a CIFS kernel driver issue, reassigned to Enzo.
Comment 7 Enzo Matsumiya 2022-04-25 15:07:06 UTC
Please share the mount options for \\archives.urbangames.local\public, either fstab or manual is fine.

As for Nautilus, CIFS is usually not involved. But I'm not sure how that works with autofs in play.
Comment 8 Sergio Lindo 2022-04-25 16:00:45 UTC
I don't use fstab.
Here is the autofs setup.

$ cat /etc/auto.master
+auto.master
/mnt/smb/archives /etc/autofs/auto.archives.smb --timeout 60 --browse
$ cat /etc/
public -fstype=cifs,auto,credentials=samba-credentials.archives.txt,rw,uid=1000,gid=100,file_mode=0664,dir_mode=0775,iocharset=iso8859-15,vers=3.0 ://archives.urbangames.local/public
Comment 9 Sergio Lindo 2022-04-25 16:02:04 UTC
(In reply to Sergio Lindo from comment #8)
> I don't use fstab.
> Here is the autofs setup.
> 
> $ cat /etc/auto.master
> +auto.master
> /mnt/smb/archives /etc/autofs/auto.archives.smb --timeout 60 --browse
> $ cat /etc/
> public
> -fstype=cifs,auto,credentials=samba-credentials.archives.txt,rw,uid=1000,
> gid=100,file_mode=0664,dir_mode=0775,iocharset=iso8859-15,vers=3.0
> ://archives.urbangames.local/public

$ cat /etc/autofs/auto.archives.smb
public -fstype=cifs,auto,credentials=samba-credentials.archives.txt,rw,uid=1000, gid=100,file_mode=0664,dir_mode=0775,iocharset=iso8859-15,vers=3.0 ://archives.urbangames.local/public
Comment 10 Enzo Matsumiya 2022-04-25 17:32:43 UTC
(In reply to Sergio Lindo from comment #9)
> $ cat /etc/autofs/auto.archives.smb
> public
> -fstype=cifs,auto,credentials=samba-credentials.archives.txt,rw,uid=1000,
> gid=100,file_mode=0664,dir_mode=0775,iocharset=iso8859-15,vers=3.0
> ://archives.urbangames.local/public

What happens if you try to manually mount the share with those options? Do you also see cifs_mount fail with -40 (i.e. -ELOOP)?

Is that share a DFS share?
Comment 11 Enzo Matsumiya 2022-04-25 18:03:40 UTC
Please provide CIFS debugging messages with a failing action:

<umount all cifs shares, systemctl stop autofs>
# rmmod cifs
# modprobe cifs
# dmesg -C
# echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control
# echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control
# echo 1 > /proc/fs/cifs/cifsFYI

<perform mount/ls on cifs share>

# echo 'module cifs -p' > /sys/kernel/debug/dynamic_debug/control
# echo 'file fs/cifs/* -p' > /sys/kernel/debug/dynamic_debug/control
# echo 0 > /proc/fs/cifs/cifsFYI

Then attach the logs here.
Comment 12 Sergio Lindo 2022-05-02 07:46:30 UTC
Created attachment 858557 [details]
linux-5.17-samba-cifs.journal.log

(In reply to Enzo Matsumiya from comment #10)
> What happens if you try to manually mount the share with those options? Do
> you also see cifs_mount fail with -40 (i.e. -ELOOP)?

Yes, it also fails with -40.


> Is that share a DFS share?

No.


> Please provide CIFS debugging messages with a failing action:
> Then attach the logs here.

Attached.
Comment 13 Enzo Matsumiya 2022-05-02 18:19:15 UTC
(In reply to Sergio Lindo from comment #12)
> (In reply to Enzo Matsumiya from comment #10)
> > What happens if you try to manually mount the share with those options? Do
> > you also see cifs_mount fail with -40 (i.e. -ELOOP)?
> 
> Yes, it also fails with -40.
> 
> > Is that share a DFS share?
> 
> No.

As I suspected, cifs is interpreting the share as a DFS one:

> CIFS: fs/cifs/connect.c: mount_get_dfs_conns: marking tcp session as a dfs connection
> CIFS: fs/cifs/connect.c: build_unc_path_to_root: full_path=\\archives.urbangames.local\public
> CIFS: fs/cifs/connect.c: connect_dfs_target: full_path=\\archives.urbangames.local\public ref_path=\archives.urbangames.local\public target=\archives.urbangames.local\public
> CIFS: fs/cifs/dfs_cache.c: dfs_cache_get_tgt_referral: path: \archives.urbangames.local\public
> CIFS: fs/cifs/dfs_cache.c: dfs_cache_get_tgt_referral: target name: \archives.urbangames.local\public
> CIFS: fs/cifs/dfs_cache.c: setup_referral: set up new ref

Once it sets up a DFS cache entry and sets marks the connection as DFS, where the ref path and the target both points to the same share, it will keep trying to connect to it, getting EREMOTE, until -ELOOP is returned.

Can you try mounting the share with 'nodfs' mount option so we can validate this theory?

I can see in the code how this is happening, but I still don't understand *why*. This is probably related to how the Synology server is implemented, since I can't reproduce this on Samba nor Windows Server 2019/2022.

> > Please provide CIFS debugging messages with a failing action:
> > Then attach the logs here.
> 
> Attached.

Thanks.
Comment 14 Enzo Matsumiya 2022-05-02 20:47:03 UTC
(In reply to Enzo Matsumiya from comment #13)
> As I suspected, cifs is interpreting the share as a DFS one:

Just clarifying: cifs is interpreting as a DFS share because the server is sending all the information required to constitute a DFS share, even though you're probably not setting it on the server interface.
Comment 15 Paulo Alcantara 2022-05-02 23:46:57 UTC
You claim it isn't a DFS share, even though the server is returning a DFS root referral of \\archives.urbangames.local\public, which turns out to be a standalone DFS namespace.

The client successfully tree connects to \archives.urbangames.local\public share, then sends an SMB2_CREATE request to the root of the share to check whether it is accessible, but receives an STATUS_OBJECT_NAME_INVALID -- which eventually gets mapped to -EREMOTE due to this commit:

  commit a2809d0e16963fdf3984409e47f145cccb0c6821
  Author: Eugene Korenevsky <ekorenevsky@astralinux.ru>
  Date:   Fri Jan 14 22:53:40 2022 +0300
  
      cifs: quirk for STATUS_OBJECT_NAME_INVALID returned for non-ASCII dfs refs
  
      Windows SMB server responds with STATUS_OBJECT_NAME_INVALID code to
      SMB2 QUERY_INFO request for "\<server>\<dfsname>\<linkpath>" DFS reference,
      where <dfsname> contains non-ASCII unicode symbols.
  
      Check such DFS reference and emulate -EREMOTE if it is actual.
  
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215440
      Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
      Signed-off-by: Steve French <stfrench@microsoft.com>

Therefore, the client loops getting cached DFS referral and STATUS_OBJECT_NAME_INVALID::-EREMOTE over and over in __follow_dfs_link().

You might probably want to use 'nodfs' as per mount.cifs(8):

       nodfs  Do not follow Distributed FileSystem referrals. IO on a file not
              stored on the server will fail instead of connecting to the tar-
              get server transparently.

In any case, we'll investigate which commit might have it regressed and then let you know.

Please provide verbose logs and network traces of working and non-working kernels.
Comment 16 Sergio Lindo 2022-05-09 07:54:29 UTC
Created attachment 858746 [details]
linux-5.17.4-samba-cifs.nodfs.journal.log

(In reply to Paulo Alcantara from comment #15)
> You might probably want to use 'nodfs' as per mount.cifs(8):
>
> Please provide verbose logs and network traces of working and non-working
> kernels.

Attached
Comment 17 Sergio Lindo 2022-05-09 07:56:08 UTC
Created attachment 858747 [details]
linux-5.17.4-samba-cifs.nodfs.journal.log

(In reply to Paulo Alcantara from comment #15)
> You might probably want to use 'nodfs' as per mount.cifs(8):
>
> Please provide verbose logs and network traces of working and non-working
> kernels.

Attached
Comment 18 Sergio Lindo 2022-05-09 07:58:11 UTC
Created attachment 858748 [details]
linux-5.16.15-samba-cifs.journal.log

(In reply to Paulo Alcantara from comment #15)
> Please provide verbose logs and network traces of working and non-working
> kernels.

Attached
Comment 19 Enzo Matsumiya 2022-05-10 20:42:21 UTC
@Sergio can you provide a tcpdump of the failing and working mounts operations as well please?
Comment 20 Enzo Matsumiya 2022-05-12 19:52:55 UTC
(In reply to Paulo Alcantara from comment #15)
> You claim it isn't a DFS share, even though the server is returning a DFS
> root referral of \\archives.urbangames.local\public, which turns out to be a
> standalone DFS namespace.

@Sergio I went through Synology DSM 7.1 spec and it seems that the SMB module's MSDFS VFS setting was merged into something called "SMB aggregation portal".

I don't know the internals, but it seems to me that, if enabled, it will treat all shares as DFS shares, even if you don't explicitly use it.

You can check/disable as per https://kb.synology.com/en-global/DSM/help/DSM/AdminCenter/file_aggregate_portal?version=7

Can you confirm whether you have it enabled and if disabling it makes the shares mount again please?
Comment 21 Sergio Lindo 2022-05-16 11:11:13 UTC
(In reply to Enzo Matsumiya from comment #20)
>
> @Sergio I went through Synology DSM 7.1 spec and it seems that the SMB
> module's MSDFS VFS setting was merged into something called "SMB aggregation
> portal".
> 
> I don't know the internals, but it seems to me that, if enabled, it will
> treat all shares as DFS shares, even if you don't explicitly use it.
> 
> You can check/disable as per
> https://kb.synology.com/en-global/DSM/help/DSM/AdminCenter/
> file_aggregate_portal?version=7
> 
> Can you confirm whether you have it enabled and if disabling it makes the
> shares mount again please?

Hi, I can confirm that disabling that option, I can access the share with kernel 5.17.5-1-default

Do you still need tcpdump's?
Comment 22 Enzo Matsumiya 2022-05-16 15:21:14 UTC
(In reply to Sergio Lindo from comment #21)
> Hi, I can confirm that disabling that option, I can access the share with
> kernel 5.17.5-1-default

Great news then, thanks for confirming!

> Do you still need tcpdump's?

Yes, please. It's certain that Synology is modifying the path string to something else when that aggregation portal thing is enabled.

We need the tcpdumps to identify what they're sending to the client so we can adjust the code in case users want to connect to their Synology boxes.
Comment 23 Enzo Matsumiya 2022-06-13 15:23:11 UTC
@Sergio will you be able to provide the pcaps?

If you do, I can evaluate the possibility of a workaround in cifs code to accomodate this behaviour. Otherwise I'll just close this as invalid.

Either way, I suggest you open a ticket with Synology with all the gathered data so they can fix it on their end.
Comment 24 Sergio Lindo 2022-06-13 15:36:46 UTC
Hi Enzo,

Yes, it is in my TO DO list.

Maybe I will have time next week... :(
Comment 25 Sergio Lindo 2022-06-17 08:15:44 UTC
Hi Enzo,

Which command options and filters should I use for the tcpdump capture?
Comment 26 Enzo Matsumiya 2022-06-17 15:21:40 UTC
(In reply to Sergio Lindo from comment #25)
> Hi Enzo,
> 
> Which command options and filters should I use for the tcpdump capture?

# tcpdump -i <interface> -s 0 -w cifs.pcap port 445

where <interface> is the NIC where the client is connecting to the server.

Then you can open "cifs.pcap" on Wireshark and filter by "smb2" packets and export only those.

If you want to edit/redact something out, please make sure to do so on Wireshark before exporting.
Comment 27 Sergio Lindo 2022-07-27 06:58:11 UTC
Sorry for taking so long, but at the moment I cannot test changing the setting, since it invalidates all cached login credentials to the SAMBA share in that NAS.

I expect to receive a new synology NAS to setup after August. I will then use tcpdump against it.
Comment 28 Enzo Matsumiya 2022-08-17 13:39:54 UTC
*** Bug 1202463 has been marked as a duplicate of this bug. ***
Comment 29 Ben Walter 2022-08-17 20:38:55 UTC
Created attachment 860865 [details]
Packet traces

ds1517 is the Synology device tracing. mag01 is the SUSE OS tracing (I don't have captures specifically for openSUSE).
Comment 30 Paulo Alcantara 2022-08-19 20:53:07 UTC
Ben, Sergio,

I've ended up with a patch[1] that should fix the mount issues.

Please give it a try from [2] and then let me know.  It should be available shortly after build finishes.

Thanks.

[1] https://git.cjr.nz/linux.git/commit/?h=cifs-dfs&id=3297cdaf986fb16e37bfad2670226c10c33a5638
[2] https://build.opensuse.org/package/show/home:pauloac:kernel-source-bsc1198753/kernel-default