Bug 1205258

Summary: Multi-step transactions for transactional updates
Product: [openSUSE] openSUSE Distribution Reporter: Eric Levy <contact>
Component: MicroOSAssignee: Ignaz Forster <iforster>
Status: NEW --- QA Contact: E-mail List <qa-bugs>
Severity: Enhancement    
Priority: P5 - None CC: iforster
Version: Leap 15.5   
Target Milestone: ---   
Hardware: All   
OS: openSUSE Leap 15.5   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Eric Levy 2022-11-09 20:54:51 UTC
Transactional updates, available as an installation option in Leap, ensures that each change to the system is captured and recorded within a transaction. However, the limitation, compared to normal systems, is that system changes cannot be tested for their practical effect until after a system reboot. Afterward, if any problems are found, the transaction must rolled back, and the change must be attempted again, after another reboot. Such a process is particularly cumbersome if the outcome of a change would not be determined until it would be tested in practice.

Proposed is a mechanism for opening a transaction, which would remain open after the completion of the command opening it. Further commands would then allow entry into the open transaction, either in read-only or read-write mode, until the transaction is finally closed.

For example, one might consider consider the following command sequence:

1. sudo transactional-update begin
2. sudo transcational-update enter -w
  a. sudo zypper ins somepackage
  b. exit
3. sudo transactional-update enter
  a. cd ~/project
  b. make
  c. exit
4. sudo transactional-update commit

In the sequence, the user has requested that a new transaction be opened. It will remain open even after the command completes. The user has then entered the  environment of the transaction, with the ability to modify the system. The tool creates a shell in the isolated environment, in which the user has selected a package for installation, and then has exited. The transaction remains open even after the shell closes. To test whether the system modifications carry the desired effect, the user has entered again into the transaction environment, but without the ability to modify the system. The user has then attempted a build process, in the modified system environment, to determine whether the system changes have had the desired effect. In this environment, the installed package is available, though not on main system. The user has finally exited the environment, and then committed the system changes. (Alternatively, the user may have destroyed the transaction, if needed.)

After the transaction is committed, it represents the system state for the next boot session.

In a normal system, a user may install packages or make other system changes, and test the effects immediately. The user may then make further changes, or reverse earlier ones, again, with immediate effect. The proposed enhancement allows the same capability in an isolated environment, which is optionally committed for the next boot cycle.

It is possible to achieve a comparable effect by simply adding a read-only option to the tranactional-update command. This option could be combined with the continue switch. However, explicit semantics for the use case is likely to reduce the incidence of human error and ensure best practices.
Comment 1 Stefan Hundhammer 2022-11-10 09:03:15 UTC
Eric, thank you for your very elaborate and well though-out idea.

This is not really for the YaST installer; I'll check where to forward this to, so it doesn't get lost.
Comment 2 Eric Levy 2022-11-10 09:19:01 UTC
I understand about the category.

I am not familiar with the layout of the reporting system. I probably made a few mistakes, and the system may have made some choices automatically. I agree the issue is not related to YaST, and would be happy to see the report processed in the most appropriate way.
Comment 3 Stefan Hundhammer 2022-11-10 09:24:08 UTC
Don't worry about that; you have no way of knowing those details. We'll find somebody to take care of this.
Comment 4 Stefan Hundhammer 2022-11-10 09:33:14 UTC
Ignatz Forster is taking over from here.
Comment 5 Ignaz Forster 2022-11-10 10:15:44 UTC
Most of the described functionality is available via the `tukit` command already. Your example would look as follows:

1. sudo tukit open
2. sudo tukit call <id> bash
  a. sudo zypper in somepackage
  b. exit
3 has conceptual problems. First of all there is the problem that you want to access data in your user's home, so you'd have to switch back to that user first (as you are calling the outer command with sudo). But as /home is not part of the root file system it is also not mounted by default in the transactional-update environment.
The following commands could be used to get the behavior you want (minus the ro part):
3. sudo tukit call <id> bash
  a. mount -o subvol=@/home
  b. su - user
  c. cd ~/project
  d. make
  e. exit
  f. exit
4. sudo tukit close <id>

To get a "full" emulation of the new system you could also use `mount -a` in step 3a, which would also mount directories such as /var and /srv into the transactional-update environment, but that will obviously break the isolation. Due to that I don't really want to include that as an official command, but it may be worth writing a blog post or a Wiki entry about this?
Comment 6 Eric Levy 2022-11-10 11:00:55 UTC
Thank you for the explanation. I had no idea about such support coming from tukit. I had reviewed the manual for the command transactional-update, but I have not found any comprehensive guide, tutorial, or manual for the overall system, that is, the general design of the TU framework. I have generally found only documentation for specific components, or high-level overviews of the concept. 

A comprehensive document that targets the practical use cases for administrators and users, would be very helpful. Elaboration of the inner plumbing would be a helpful too, as an indispensable resource for disaster recovery. To my mind, such a document would useful overall much more so than just a blog post on this particular subject, especially because it may be hard for the right person to find the resource at the time when needed. However, it certainly might help someone at some time, and certainly would be more helpful for the subject than any resource I have found so far.

Yes, I agree you found a flaw regarding the privileges of the user shell, which could be resolved through appropriate modifications of the concept. You could simply allow any user to enter the open transaction, with regular permissions.

At the present time, I feel compelled to push back against the idea of lack of isolation being a major problem. Locations storing user data are meant to be manipulated by operations initiated by the user, in ways that the user understands. My motivation for submitting the request is that in a vanilla installation, it is possible for the an administrator to install a package on an active system, immediately interact with the modified system, and then, if desired, reverse the system changes, or add new ones. Surely it is possible that such a sequence would leave artifacts in data locations, but such effects are well understood, and no more harmful in the transactional scenario than in the classical one. 

For cases where some isolation is desired, you could consider adding an option for providing the full environment through temporary snapshots, such that the changes to these locations within the test session will be lost, even if the changes to the system are committed. The purpose would be to test operations, for resolving whether the new system image is one that would be useful for future boot sessions, or not. For example, the build operation is reproducible. If it succeeds in the test environment, it will succeed in new system configuration, once it would become active. Please remember also, if I am not mistaken, that rollback of a previous transaction will not roll back modifications to data directories occurring during any boot sessions following the transaction. Thus, you could be providing more isolation than available currently, by opening the possibility for utilizing the system changes in a special session.

My understanding would be that if an administrator must roll back a transaction, any problems encountered would be the same as the ones that cause you to hesitate on the idea I am proposing. The proposed method is simply a means to test the consequences of a transaction before committing it for future boot sessions, and without interrupting the boot session of the live system. Once the changes are confirmed, and the transaction committed, then the system would be rebooted, with predictable effects.

If I am not making some important oversight, you might consider supporting the work flow through an accessible set of operations in the tool chain. I believe the concerns I have raised originally are rather common, and without a way to address them, it may be harder for many sites to adopt the system, or to find ways to resolve unforeseen problems after adoption.