Checkpoint/restore of capabilities

Discussion:

Denis Huber

2016-08-31 15:21:54 UTC

Dear Genode community,

as some of you already know, I want to create a checkpoint/restore
mechanism for components on Genode 16.05 and Fiasco.OC. I want to create
a component which monitors the PD and CPU sessions of its children (=
targets for checkpoint/restore) to have access to their memory and
thread states.

For a complete checkpoint and restore of a child I also need to store
the capabilities used by a child. How can I acquire and also restore all
these capabilities?

After a restore of a component the capability space shall be the "same"
as before the checkpoint:
1) The capabilities after the restore shall point to corresponding
object identities.
2) Also the capabilities after the restore shall be on the same slot
(have the same address) in the capability space as before the checkpoint.

The capability space resides in the kernel and Genode does not offer an
API to manipulate it. Is there a way to accomplish my goal with Genode's
API anyway?

Kind regards,
Denis

------------------------------------------------------------------------------

Norman Feske

2016-08-31 16:43:43 UTC

Permalink

Hello Denis,

Post by Denis Huber
After a restore of a component the capability space shall be the "same"
1) The capabilities after the restore shall point to corresponding
object identities.
2) Also the capabilities after the restore shall be on the same slot
(have the same address) in the capability space as before the checkpoint.
The capability space resides in the kernel and Genode does not offer an
API to manipulate it. Is there a way to accomplish my goal with Genode's
API anyway?

there is no ready-to-use solution via the Genode API because the way
capabilities are handled vastly differs between the various kernels.
Manipulating the capability space of a remote component wouldn't even be
possible on some kernels. However, since you are using a specific kernel
(Fiasco.OC) that provides an asynchronous map operation, the problem can
be tackled in a kernel-specific way.

I would propose to extend the existing 'Foc_native_pd' RPC interface [1]
with RPC functions for requesting and installing capabilities from/into
the capability space of the PD.

[1]
https://github.com/genodelabs/genode/tree/master/repos/base-foc/include/foc_native_pd

The function for requesting a capability would have an iterator-like
interface that allows the client to iterate over the PD-local selector
numbers and sequentially obtain the underlying capabilities as
Genode::Native_capability objects (which can be delegated via RPC). Each
call would return a Genode::Native_capability and a selector number of
the capability to request with the next RPC call. In a first version,
you may simply iterate over all numbers up to the maximum selector
number, returning invalid capabilities for unused selectors. The
iterator-like interface would then be a performance optimization.

The function for installing a capability would take a
Genode::Native_capability and the destination selector number as
arguments. The implementation of the 'Foc_native_pd' interface resides
in core, which has access to all capabilities. The implementation would
directly issue Fiasco.OC system calls (most likely 'l4_task_map') to
install the given capability into the targeted PD.

Does this sound like a reasonable plan?

Cheers
Norman
--
Dr.-Ing. Norman Feske
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

------------------------------------------------------------------------------

Denis Huber

2016-09-03 11:22:19 UTC

Permalink

Hello Norman,

your approach sounds really good and promising. But I have a problem
when storing the Capabilities from the Cap Space of the child:

The child component shall be migrated from one ECU to another. The
Genode system on the other ECU may have the Rpc_objects, which the child
needs (e.g. shared dataspaces), but their object identities are
different (e.g. other addresses in memory) or the Rpc_objects do not
exist (e.g. a session object between the child and a service).

During a restore, I will have to relink the Native_capability to the
available Rpc_object or simply recreate the Rpc_object. In both cases I
have to know the types of the Native_capabilities, when I snapshot them
from the Cap Space of the child. Is there a way to find out the type of
a Native_capability through an API function?

If there is no ready-to-use function/approach, can I intercept the type
to which a Native_capability is reinterpreted in Rpc_entrypoint::manage
as a workaround solution?

Kind regards,
Denis

Post by Norman Feske
Hello Denis,

there is no ready-to-use solution via the Genode API because the way
capabilities are handled vastly differs between the various kernels.
Manipulating the capability space of a remote component wouldn't even be
possible on some kernels. However, since you are using a specific kernel
(Fiasco.OC) that provides an asynchronous map operation, the problem can
be tackled in a kernel-specific way.
I would propose to extend the existing 'Foc_native_pd' RPC interface [1]
with RPC functions for requesting and installing capabilities from/into
the capability space of the PD.
[1]
https://github.com/genodelabs/genode/tree/master/repos/base-foc/include/foc_native_pd
The function for requesting a capability would have an iterator-like
interface that allows the client to iterate over the PD-local selector
numbers and sequentially obtain the underlying capabilities as
Genode::Native_capability objects (which can be delegated via RPC). Each
call would return a Genode::Native_capability and a selector number of
the capability to request with the next RPC call. In a first version,
you may simply iterate over all numbers up to the maximum selector
number, returning invalid capabilities for unused selectors. The
iterator-like interface would then be a performance optimization.
The function for installing a capability would take a
Genode::Native_capability and the destination selector number as
arguments. The implementation of the 'Foc_native_pd' interface resides
in core, which has access to all capabilities. The implementation would
directly issue Fiasco.OC system calls (most likely 'l4_task_map') to
install the given capability into the targeted PD.
Does this sound like a reasonable plan?
Cheers
Norman

------------------------------------------------------------------------------

Norman Feske

2016-09-09 08:58:20 UTC

Permalink

Hi Denis,

Post by Denis Huber
The child component shall be migrated from one ECU to another. The
Genode system on the other ECU may have the Rpc_objects, which the child
needs (e.g. shared dataspaces), but their object identities are
different (e.g. other addresses in memory) or the Rpc_objects do not
exist (e.g. a session object between the child and a service).

so the problem goes much deeper than merely requesting and populating
the child's capability space. You need the replicate the entire child's
execution environment at the destination ECU. That means for each
capability in possession of the child, your runtime needs to know the
exact meaning. E.g., if the child has a session capability to a session
created with certain session arguments, the same kind of session must be
re-created at the destination ECU. Of course, the same holds for all
dataspaces, threads, and other RPC objects that the child can reference
via the capabilities present in its capability space.

The logical consequence is that the runtime must virtualize all services
used by the child. E.g. if the child creates a LOG session, the runtime
would create a session to a LOG service in the child's name but hand out
a capability locally implemented LOG-session wrapper - similar to what
you have already done for the RAM service. So when migrating the child,
you now exactly what the various capabilities in the child's capability
space mean and can transfer the underlying state to the destination ECU.

In principle, this is how Noux solves the fork problem. But in the case
of Noux, I deliberately avoid populating the child's capability space
with Genode capabilities in order to alleviate the need to virtualize
many Genode services. Instead, I let the child use the Noux session as
its only interface to the outside world. At the Noux-session level, the
child does not talk about Genode capabilities but about file
descriptors, for which Noux knows the meaning. Of course there exist a
few capabilities in the child's capability space, in particular the
parent cap, the Noux-session cap, and the caps of the child's
environment. But these few capabilities are manually re-initialized by
the freshly created process after the fork.

In your case, you want to replicate the child's capability space in a
way that is transparent to the child. Like Noux, you need the have a
complete model of the child's execution environment in your runtime.
Unlike Noux, however, you want to let the child interact with various
Genode services. Consequently, your model needs to capture the those
services.

Post by Denis Huber
During a restore, I will have to relink the Native_capability to the
available Rpc_object or simply recreate the Rpc_object. In both cases I
have to know the types of the Native_capabilities, when I snapshot them
from the Cap Space of the child. Is there a way to find out the type of
a Native_capability through an API function?

As discussed above, the type alone does not suffice. Your runtime needs
to know the actual semantics behind each capability, e.g., not just the
knowledge that a certain capability is a RAM-session capability but also
the information how much quota the RAM session has and which dataspaces
belong to it. Or as another example, you don't just need to know that a
capability is a file-system session but also the session arguments that
were used when the session was created.

Post by Denis Huber
If there is no ready-to-use function/approach, can I intercept the type
to which a Native_capability is reinterpreted in Rpc_entrypoint::manage
as a workaround solution?

Since your runtime needs to create a representative for each RPC object
the child interacts with in the form of a locally implemented RPC object
(managed by the runtime's entrypoint), you can in principle use the
'Rpc_entrypoint::apply' method to look up the local RPC object for a
given capability.

Best regards
Norman
--
Dr.-Ing. Norman Feske
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

------------------------------------------------------------------------------

Denis Huber

2016-09-10 09:52:53 UTC

Permalink

Hello Norman,

thank you for your great answer. I will follow your advise and
virtualize all necessary services that a target component uses.

Kind regards,
Denis

Post by Norman Feske
Hi Denis,

so the problem goes much deeper than merely requesting and populating
the child's capability space. You need the replicate the entire child's
execution environment at the destination ECU. That means for each
capability in possession of the child, your runtime needs to know the
exact meaning. E.g., if the child has a session capability to a session
created with certain session arguments, the same kind of session must be
re-created at the destination ECU. Of course, the same holds for all
dataspaces, threads, and other RPC objects that the child can reference
via the capabilities present in its capability space.
The logical consequence is that the runtime must virtualize all services
used by the child. E.g. if the child creates a LOG session, the runtime
would create a session to a LOG service in the child's name but hand out
a capability locally implemented LOG-session wrapper - similar to what
you have already done for the RAM service. So when migrating the child,
you now exactly what the various capabilities in the child's capability
space mean and can transfer the underlying state to the destination ECU.
In principle, this is how Noux solves the fork problem. But in the case
of Noux, I deliberately avoid populating the child's capability space
with Genode capabilities in order to alleviate the need to virtualize
many Genode services. Instead, I let the child use the Noux session as
its only interface to the outside world. At the Noux-session level, the
child does not talk about Genode capabilities but about file
descriptors, for which Noux knows the meaning. Of course there exist a
few capabilities in the child's capability space, in particular the
parent cap, the Noux-session cap, and the caps of the child's
environment. But these few capabilities are manually re-initialized by
the freshly created process after the fork.
In your case, you want to replicate the child's capability space in a
way that is transparent to the child. Like Noux, you need the have a
complete model of the child's execution environment in your runtime.
Unlike Noux, however, you want to let the child interact with various
Genode services. Consequently, your model needs to capture the those
services.

Post by Denis Huber
If there is no ready-to-use function/approach, can I intercept the type
to which a Native_capability is reinterpreted in Rpc_entrypoint::manage
as a workaround solution?

------------------------------------------------------------------------------

Denis Huber

2016-09-21 15:42:07 UTC

Permalink

Hello again,

I have two small problems where I need some guidance from you :)

1. I am trying to understand the mechanism of l4_task_map [1]. Are the
following thoughts correct?

* The destination and source task cap (first 2 args of l4_task_map) can
be retrieved through Pd_session::native_pd() and Foc_native_pd::task_cap().
* Send flexpage (arg #3) describes a memory area which contains the
selector number (= address) of the source task's capability.
* The send base (arg #4) is an integer which contains the address of the
capability of the the destination task and also an operation code number
for e.g. mapping or granting the capability.

[1]
https://l4re.org/doc/group__l4__task__api.html#ga0a883fb598c3320922f0560263da35e6

To iterate through all possible capabilities I need to know where the
capability space starts (first valid selector number) and where it ends.
Where can I find these information? I.e. which source files are relevant?

2. I also wanted to look up the mechanism of Noux where it
re-initializes the parent cap, the noux session cap, and the caps of a
child's environment after a fork. But I cannot find the corresponding files.

Kind regards,
Denis

Post by Denis Huber
Hello Norman,
thank you for your great answer. I will follow your advise and
virtualize all necessary services that a target component uses.
Kind regards,
Denis

Post by Norman Feske
Hi Denis,

so the problem goes much deeper than merely requesting and populating
the child's capability space. You need the replicate the entire child's
execution environment at the destination ECU. That means for each
capability in possession of the child, your runtime needs to know the
exact meaning. E.g., if the child has a session capability to a session
created with certain session arguments, the same kind of session must be
re-created at the destination ECU. Of course, the same holds for all
dataspaces, threads, and other RPC objects that the child can reference
via the capabilities present in its capability space.
The logical consequence is that the runtime must virtualize all services
used by the child. E.g. if the child creates a LOG session, the runtime
would create a session to a LOG service in the child's name but hand out
a capability locally implemented LOG-session wrapper - similar to what
you have already done for the RAM service. So when migrating the child,
you now exactly what the various capabilities in the child's capability
space mean and can transfer the underlying state to the destination ECU.
In principle, this is how Noux solves the fork problem. But in the case
of Noux, I deliberately avoid populating the child's capability space
with Genode capabilities in order to alleviate the need to virtualize
many Genode services. Instead, I let the child use the Noux session as
its only interface to the outside world. At the Noux-session level, the
child does not talk about Genode capabilities but about file
descriptors, for which Noux knows the meaning. Of course there exist a
few capabilities in the child's capability space, in particular the
parent cap, the Noux-session cap, and the caps of the child's
environment. But these few capabilities are manually re-initialized by
the freshly created process after the fork.
In your case, you want to replicate the child's capability space in a
way that is transparent to the child. Like Noux, you need the have a
complete model of the child's execution environment in your runtime.
Unlike Noux, however, you want to let the child interact with various
Genode services. Consequently, your model needs to capture the those
services.

Post by Denis Huber
If there is no ready-to-use function/approach, can I intercept the type
to which a Native_capability is reinterpreted in Rpc_entrypoint::manage
as a workaround solution?

------------------------------------------------------------------------------
_______________________________________________
genode-main mailing list
https://lists.sourceforge.net/lists/listinfo/genode-main

------------------------------------------------------------------------------

Stefan Kalkowski

2016-09-22 08:16:17 UTC

Permalink

Hello Denis,

Post by Denis Huber
Hello again,
I have two small problems where I need some guidance from you :)
1. I am trying to understand the mechanism of l4_task_map [1]. Are the
following thoughts correct?
* The destination and source task cap (first 2 args of l4_task_map) can
be retrieved through Pd_session::native_pd() and Foc_native_pd::task_cap().
* Send flexpage (arg #3) describes a memory area which contains the
selector number (= address) of the source task's capability.
* The send base (arg #4) is an integer which contains the address of the
capability of the the destination task and also an operation code number
for e.g. mapping or granting the capability.
[1]
https://l4re.org/doc/group__l4__task__api.html#ga0a883fb598c3320922f0560263da35e6

That is correct.

Post by Denis Huber
To iterate through all possible capabilities I need to know where the
capability space starts (first valid selector number) and where it ends.
Where can I find these information? I.e. which source files are relevant?

THe capability space of each component is split between an area
controlled by core, and one controlled by the component itself.
Everything underneath Fiasco::USER_BASE_CAP (in file:
repos/base-foc/include/foc/native_capability.h:63) is used by core, and
has the following layout: the first nine slots are reserved to not
interfere with fixed capabilities of Fiasco.OC/L4Re. The only
capabilities of this fixed area that we use are the task capability
(slot 1) and the parent capability (slot 8).
The rest of the core area is divided into thread-local capabilities.
Every thread has three dedicated capabilities: a capability to its own
IPC gate (so to say its identity), a capability to its pager object, and
a capability to an IRQ object (some kind of kernel semaphore), that is
used for blocking in the case of lock-contention. You can find the
layout information again in the file:
repos/base-foc/include/foc/native_capability.h.

Everything starting from slot 200 is controlled by the component itself.
Each component has a capability allocator, and some kind of registry
containing all currently allocated capabilities that is called "cap map":

repos/base-foc/src/include/base/internal/cap_*
repos/base-foc/src/lib/base/cap_*

Currently, the per-component capability allocator is (compile-time)
restricted to a number of up to 4K capabilities. The special component
core can allocate more capabilities, because it always owns every
capability in the system.

The capability space controlled by the component thereby ranges from
200-4296, but it is filled sparsely. When not knowing the "cap map" of a
component, you can however check the validity of a single capability
with `l4_task_cap_valid`, have a look here:

https://l4re.org/doc/group__l4__task__api.html#ga829a1b5cb4d5dba33ffee57534a505af

Post by Denis Huber
2. I also wanted to look up the mechanism of Noux where it
re-initializes the parent cap, the noux session cap, and the caps of a
child's environment after a fork. But I cannot find the corresponding files.

AFAIK, in Noux the parent capability in the .data section of the program
gets overwritten:

repos/ports/src/noux/child.h:458
repos/ports/src/noux/ram_session_component.h:80

After that parts of the main thread initialization of the target needs
to be re-done, otherwise e.g., the serialized form of the parent
capability in the data section would have no effect. But I'm not well up
with respect to Noux initialization. After some grep, I found this being
the first routine executed by the forked process:

repos/ports/src/lib/libc_noux/plugin.cc:526

It shows up, how parent capability gets set, and the environment gets
re-loaded.

Best regards
Stefan

Post by Denis Huber
Kind regards,
Denis

Post by Denis Huber
Hello Norman,
thank you for your great answer. I will follow your advise and
virtualize all necessary services that a target component uses.
Kind regards,
Denis

Post by Norman Feske
Hi Denis,

so the problem goes much deeper than merely requesting and populating
the child's capability space. You need the replicate the entire child's
execution environment at the destination ECU. That means for each
capability in possession of the child, your runtime needs to know the
exact meaning. E.g., if the child has a session capability to a session
created with certain session arguments, the same kind of session must be
re-created at the destination ECU. Of course, the same holds for all
dataspaces, threads, and other RPC objects that the child can reference
via the capabilities present in its capability space.
The logical consequence is that the runtime must virtualize all services
used by the child. E.g. if the child creates a LOG session, the runtime
would create a session to a LOG service in the child's name but hand out
a capability locally implemented LOG-session wrapper - similar to what
you have already done for the RAM service. So when migrating the child,
you now exactly what the various capabilities in the child's capability
space mean and can transfer the underlying state to the destination ECU.
In principle, this is how Noux solves the fork problem. But in the case
of Noux, I deliberately avoid populating the child's capability space
with Genode capabilities in order to alleviate the need to virtualize
many Genode services. Instead, I let the child use the Noux session as
its only interface to the outside world. At the Noux-session level, the
child does not talk about Genode capabilities but about file
descriptors, for which Noux knows the meaning. Of course there exist a
few capabilities in the child's capability space, in particular the
parent cap, the Noux-session cap, and the caps of the child's
environment. But these few capabilities are manually re-initialized by
the freshly created process after the fork.
In your case, you want to replicate the child's capability space in a
way that is transparent to the child. Like Noux, you need the have a
complete model of the child's execution environment in your runtime.
Unlike Noux, however, you want to let the child interact with various
Genode services. Consequently, your model needs to capture the those
services.

Post by Denis Huber
If there is no ready-to-use function/approach, can I intercept the type
to which a Native_capability is reinterpreted in Rpc_entrypoint::manage
as a workaround solution?

--
Stefan Kalkowski
Genode Labs

https://github.com/skalk · http://genode.org/

------------------------------------------------------------------------------

Denis Huber

2016-10-02 08:12:25 UTC

Permalink

Hello Stefan,

thank you for the descriptive explanation :) I found out, that it does
not suffice to map the (kernel) Capability from the target application
to the Checkpoint/Restore application, because the Checkpoint/Restore
application knows only already existing (Genode) Capabilities (kcap and
key value) through the interception of Rpc_objects (e.g. own dataspace,
rm_session, etc.) the target application uses.

Mapping a Capability gives me a new (kernel) Capability which points to
the same object identity, but has a new kcap (= Capability space slot)
value.

Through intercepting all services the target application uses, the
Checkpoint/Restore application knows (probably) all necessary
Capabilities which are created through issuing the parent. But what
about Capabilities which are created through a local service of the
target application?

The target application could create its own service with a root and
session Rpc_object and manage requests through an Entrypoint. Although
the Entrypoint creates new Capabilities through the PD session which the
Checkpoint/Restore intercepts (PD::alloc_rpc_cap). The
Checkpoint/Restore application cannot associate the created Capability
to a concrete Rpc_object which is created by the target application itself.

To solve this problem I did not find any solutions which is transparent
to the target application nor is possible without modifying the kernel.
A non-transparent, but user-level solution would be to let the
Checkpoint/Restore application implement the service of the target
application. But this will impose rewriting existing Genode components,
which I would avoid.

Perhaps someone in the Genode community has an idea, how I can get
access to the target application's Rpc_objects created by its own service.

Kind regards,
Denis

Post by Norman Feske
Hello Denis,

That is correct.

THe capability space of each component is split between an area
controlled by core, and one controlled by the component itself.
repos/base-foc/include/foc/native_capability.h:63) is used by core, and
has the following layout: the first nine slots are reserved to not
interfere with fixed capabilities of Fiasco.OC/L4Re. The only
capabilities of this fixed area that we use are the task capability
(slot 1) and the parent capability (slot 8).
The rest of the core area is divided into thread-local capabilities.
Every thread has three dedicated capabilities: a capability to its own
IPC gate (so to say its identity), a capability to its pager object, and
a capability to an IRQ object (some kind of kernel semaphore), that is
used for blocking in the case of lock-contention. You can find the
repos/base-foc/include/foc/native_capability.h.
Everything starting from slot 200 is controlled by the component itself.
Each component has a capability allocator, and some kind of registry
repos/base-foc/src/include/base/internal/cap_*
repos/base-foc/src/lib/base/cap_*
Currently, the per-component capability allocator is (compile-time)
restricted to a number of up to 4K capabilities. The special component
core can allocate more capabilities, because it always owns every
capability in the system.
The capability space controlled by the component thereby ranges from
200-4296, but it is filled sparsely. When not knowing the "cap map" of a
component, you can however check the validity of a single capability
https://l4re.org/doc/group__l4__task__api.html#ga829a1b5cb4d5dba33ffee57534a505af

AFAIK, in Noux the parent capability in the .data section of the program
repos/ports/src/noux/child.h:458
repos/ports/src/noux/ram_session_component.h:80
After that parts of the main thread initialization of the target needs
to be re-done, otherwise e.g., the serialized form of the parent
capability in the data section would have no effect. But I'm not well up
with respect to Noux initialization. After some grep, I found this being
repos/ports/src/lib/libc_noux/plugin.cc:526
It shows up, how parent capability gets set, and the environment gets
re-loaded.
Best regards
Stefan

Norman Feske

2016-10-07 09:34:10 UTC

Permalink

Hi Denis,

Post by Denis Huber
The target application could create its own service with a root and
session Rpc_object and manage requests through an Entrypoint. Although
the Entrypoint creates new Capabilities through the PD session which the
Checkpoint/Restore intercepts (PD::alloc_rpc_cap). The
Checkpoint/Restore application cannot associate the created Capability
to a concrete Rpc_object which is created by the target application itself.

that is true. The monitoring component has no idea about the meaning of
RPC objects created internally within the child.

But the child never uses such capabilities to talk to the outside world.
If such a capability is created to provide a service to the outside
world (e.g., a session capability), your monitoring component will
actually get hold of it along with the information of its type. I.e.,
the child passes a root capability via the 'Parent::announce' RPC
function to the monitoring component, or the monitoring component
receives a session capability as a response of a 'Root::session' RPC
call (which specifies the name of the session type as argument).

Those capabilities are - strictly speaking - not needed to make the
child happy, but merely to enable someone else to use the child's
service. However, there is also the case where the child uses RPCs in a
component-local way. Even though the monitoring component does not need
to know the meaning behind those capabilities, it needs to replicate the
association of the component's internal RPC objects with the
corresponding kernel capabilities.

Post by Denis Huber
To solve this problem I did not find any solutions which is transparent
to the target application nor is possible without modifying the kernel.
A non-transparent, but user-level solution would be to let the
Checkpoint/Restore application implement the service of the target
application. But this will impose rewriting existing Genode components,
which I would avoid.
Perhaps someone in the Genode community has an idea, how I can get
access to the target application's Rpc_objects created by its own service.

This is indeed a tricky problem. I see two possible approaches:

1. Because the monitoring component is in control of the child's PD
session (and thereby the region map of the child's address space), it
may peek and poke in the virtual memory of the child (e.g., it may
may attach a portion of the child's address space as a managed
dataspace to its own region map). In particular, it could inspect
and manipulate the child-local meta data for the child's capability
space where it keeps the association between RPC object identities
and kcap selectors. This approach would require the monitor to
interpret the child's internal data structures, similar to what a
debugger does.

2. We may let the child pro-actively propagate information about its
capability space to the outside so that the monitoring component can
conveniently intercept this information. E.g. as a rough idea, we
could add a 'Pd_session::cap_space_dataspace' RPC function where a
component can request a dataspace capability for a memory buffer
where it reports the layout information of its capability space.
This could happen internally in the base library. So it would be
transparent for the application code.

I think however that merely propagating information from the child
may not be enough. You also may need a way to re-assign new RPC
object identities to the capability space of the restored child.

Noux employs a mix of both approaches when forking a process. The parent
capability is poked directly into the address space of the new process
whereas all other capabilities are re-initialized locally in the child.
Maybe you could find a middle ground where the child component reports
just enough internal information (e.g., the pointer to its 'cap_map') to
let the monitor effectively apply the first approach (peeking and poking)?

Btw, just as a side remark, this problem does not exist on the base-hw
kernel where the RPC object identities are equal to the capability
selectors.

Cheers
Norman

--
Dr.-Ing. Norman Feske
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

Denis Huber

2016-10-10 15:27:12 UTC

Permalink

Hello Norman,

thanks again for your explanation.

It sounds good, that I do not have to checkpoint the component-intern
session capabilities, if they are not used by the same component. What
about the locally created capabilities which are created during
Entrypoint creation?

In particular, when the target component creates an Entrypoint object,
then it creates a Native_capability (as Ipc_server) from a capability
found in the utcb's thread control registers:

repos/base-foc/src/lib/base/ipc.cc:377

The Ipc_server capability is used in two calls to
Pd_session::alloc_rpc_cap during Entrypoint object creation. The two
calls go to Entrypoint::manage the Exit-handler for the Rpc_entrypoint
and for the Signal_proxy_component for the Signal-API. To recreate those
Native_capabilities at restore time, I have to use the same Ipc_server
capability. How can this be done?

I also have some general questions about Genode capabilities in Fiasco.OC:
In the Genode Foundations book, on page 37, there is a figure (figure 2)
with an RPC object and its object identity. What is an object identity
in Fiasco.OC?
* How is it called there?
* Where can I find it in the source files?
* Does it comprise information about...
* ...the owner of the RPC object?
* ...which component has the data in memory?
* ...where it can be found in the address space?

Kind regards,
Denis

Post by Norman Feske
Hi Denis,

that is true. The monitoring component has no idea about the meaning of
RPC objects created internally within the child.
But the child never uses such capabilities to talk to the outside world.
If such a capability is created to provide a service to the outside
world (e.g., a session capability), your monitoring component will
actually get hold of it along with the information of its type. I.e.,
the child passes a root capability via the 'Parent::announce' RPC
function to the monitoring component, or the monitoring component
receives a session capability as a response of a 'Root::session' RPC
call (which specifies the name of the session type as argument).
Those capabilities are - strictly speaking - not needed to make the
child happy, but merely to enable someone else to use the child's
service. However, there is also the case where the child uses RPCs in a
component-local way. Even though the monitoring component does not need
to know the meaning behind those capabilities, it needs to replicate the
association of the component's internal RPC objects with the
corresponding kernel capabilities.

1. Because the monitoring component is in control of the child's PD
session (and thereby the region map of the child's address space), it
may peek and poke in the virtual memory of the child (e.g., it may
may attach a portion of the child's address space as a managed
dataspace to its own region map). In particular, it could inspect
and manipulate the child-local meta data for the child's capability
space where it keeps the association between RPC object identities
and kcap selectors. This approach would require the monitor to
interpret the child's internal data structures, similar to what a
debugger does.
2. We may let the child pro-actively propagate information about its
capability space to the outside so that the monitoring component can
conveniently intercept this information. E.g. as a rough idea, we
could add a 'Pd_session::cap_space_dataspace' RPC function where a
component can request a dataspace capability for a memory buffer
where it reports the layout information of its capability space.
This could happen internally in the base library. So it would be
transparent for the application code.
I think however that merely propagating information from the child
may not be enough. You also may need a way to re-assign new RPC
object identities to the capability space of the restored child.
Noux employs a mix of both approaches when forking a process. The parent
capability is poked directly into the address space of the new process
whereas all other capabilities are re-initialized locally in the child.
Maybe you could find a middle ground where the child component reports
just enough internal information (e.g., the pointer to its 'cap_map') to
let the monitor effectively apply the first approach (peeking and poking)?
Btw, just as a side remark, this problem does not exist on the base-hw
kernel where the RPC object identities are equal to the capability
selectors.
Cheers
Norman

Norman Feske

2016-10-12 12:49:27 UTC

Permalink

Hello Denis,

Post by Denis Huber
I also have some general questions about Genode capabilities in
Fiasco.OC: In the Genode Foundations book, on page 37, there is a
figure (figure 2) with an RPC object and its object identity. What is
an object identity in Fiasco.OC?
* How is it called there?

Note that the book is focused on NOVA and base-hw, which resemble the
presented capability model quite closely. On other kernels like seL4 and
Fiasco.OC, there is a slight mismatch between the kernel-provided
capability mechanism and Genode's notion of capabilies. You can find a
discussion of one important distinction at [1]. Hence, on these kernels,
we need to "emulate" parts of Genode's capability model using the kernel
features at hand.

[1] http://sel4.systems/pipermail/devel/2014-November/000112.html

On Fiasco.OC, an object identity consists of two parts:

1. An IPC gate bound to the entrypoint thread. The IPC gate is
a Fiasco-internal kernel object. The user-level component
refers to it via a kernel-capability selector, which is
(like a file descriptor on Unix) a component-local number
understood by the kernel. In Genode's code, we use the term
"kcap" as an abbreviation of kernel capability selector.
The kcap can be used as an argument for kernel operations,
in particular as a destination for an IPC call.

Note that kcaps may refer to various kernel objects (like
threads, tasks). But - a few exceptions notwithstanding -
a Genode capability (as returned by 'Entrypoint::manage')
refers to the kcap for an IPC gate.

2. A system-globally unique object ID, which is allocated by
core's 'Pd_session::alloc_rpc_cap' operation. Unlike the
kcap, the kernel has no idea what this ID is about. It is
just a number. Within the Genode code, this number is called
"badge" or "Rpc_obj_key". The badge value is used at the server
side as a key for looking up the RPC object that belongs to an
incoming RPC request.

Each Genode capability carries both parts. When a component inserts a
new capability into its capability space, you can see both values as
arguments to 'Capability_map::insert_map' (in
base-foc/src/lib/base/cap_map.cc).

When a Genode capability is transferred as RPC argument (via
'copy_msgbuf_to_utcb' and 'extract_msg_from_utcb' in
base-foc/src/lib/ipc.cc), you can see that the kcap part is passed as a
'L4_MAP_ITEM' whereas the badge is transferred as plain message word.

When a Genode capability is created for an RPC object
('Entrypoint::manage' -> 'PD_session::alloc_rpc_cap'), core imprints the
new badge value into the new IPC gate. This way, whenever an IPC is sent
to the IPC gate, the receiving thread (the server) receives the badge
value of the invoked object directly from the kernel. This way, a
misbehaving client cannot deliberately fake the badge when invoking an
RPC object.

Given this background, I hope that my remark about base-hw in my
previous email becomes more clear. On base-hw, we can simply use the
value of a kernel capability selector as badge value. There is no need
to system-globally unique ID values.

Post by Denis Huber
* ...the owner of the RPC object?

The kcap of a Genode capability refers to an IPC gate. The IPC gate is
associated with a thread. The thread lives in a protection domain. The
owner of an RPC object is therefore implicitly the PD that created the
IPC gate (the caller of 'Entrypoint::manage') for the RPC object.

Post by Denis Huber
* ...which component has the data in memory?

Each component keeps track of its local capability space using some
meta-data structures managed internally within the 'base' library. On
Fiasco.OC, this data structure is called 'cap_map'. It maintains the
association of 'kcap' values with their corresponding badges.

For the checkpointing/restarting, these data structure must be
interpreted/updated from the monitoring component. This may be tricky
because the cap_map is not designed to be easy to manipulate from the
outside. I.e., it has an AVL tree with the badges as keys. By solely
poking new badge values into the component's cap cap, the order of AVL
nodes becomes corrupted.

It may be possible to simplify the cap-map implementation, removing the
AVL tree. As far as I can see, the use cases for looking up a kcap by a
badge no longer exists except for sanity checks within the
implementation of the cap map itself. The method 'Capability_map::find'
is actually unused.

Post by Denis Huber
* ...where it can be found in the address space?

The cap map is instantiated as local static variable of the function
'Genode::cap_map' in 'base-foc/src/lib/base/cap_map.cc'. Hence, it is
located somewhere in the data segment of the binary. You'd need to
somehow communicate the pointer value from the component to the monitor,
e.g. by writing it to a dataspace shared between the two.

Post by Denis Huber
In particular, when the target component creates an Entrypoint object,
then it creates a Native_capability (as Ipc_server) from a capability
repos/base-foc/src/lib/base/ipc.cc:377
The Ipc_server capability is used in two calls to
Pd_session::alloc_rpc_cap during Entrypoint object creation. The two
calls go to Entrypoint::manage the Exit-handler for the Rpc_entrypoint
and for the Signal_proxy_component for the Signal-API. To recreate those
Native_capabilities at restore time, I have to use the same Ipc_server
capability. How can this be done?

The Ipc_server capability is the kernel capability selector for the
entrypoint thread. This selector is used to associate the new IPC gate
with this particular thread. So IPCs sent to the IPC gate will arrive at
the entrypoint thread. It is not a regular Genode capability because it
refers to a thread instead of an IPC gate. Fortunately, the monitor is
able to get hold of this capability because the component requests it by
calling 'Cpu_thread::state' when creating a new thread
(base-foc/src/lib/base/thread_start.cc).

Yes, the topic is really complicated. Is the confusion perfect now? ;-)

Cheers
Norman

Denis Huber

2016-10-18 16:25:04 UTC

Permalink

Hello Norman,

you are right, it is quite complicated, but I think I understand the
capability concept in Genode with Fiasco.OC. Let me recap it:

I created a simple figure [1] to illustrate my thoughts. A component has
a capability map and a kernel-intern capability space. Each managed RPC
object has a capability which points to a capability map slot that
stores a system-global identifier called badge. The capability space
slot can be computed through the capability map slot. The corresponding
capability map slot points to the object identity which is an IPC gate.

[1]
Loading Image...

In order to restore a component on another ECU, the checkpointed
variables representing capabilities (entries in memory, e.g. stack) have
to be made valid. Therefore, I have to restore the IPC gate, the
capability space slot pointing to this IPC gate, and allocate a new
badge, because it is valid only in one system and the component is
migrated to another system. Also, I have to restore the capability map
slot to point to the new badge and restore the RPC object.

In the following I assume that the RPC objects of the target component
are created by the Checkpoint/Restore component (i.e. it intercepts the
session requests and provides own sessions at child creation). The other
case regarding local RPC objects of the target component will be
discussed later, if I hopefully have the time:

By virtualizing the session RPC objects and the normal RPC objects, I
can checkpoint the state of them. Thus, I can recreate an RPC object.
When I do that the RPC object has a new capability (local to the
Checkpoint/Restore component) and a valid badge. Implicitly a valid IPC
gate is also recreated. Thus, the target component has to know this
capability inside its protection domain. Therefore, the capability
space/map slot has to point to the IPC gate or to the new badge,
respectively.
* The capability space slot is recreated by issuing l4_task_map to map a
capability from core to the target child. This is done by extending
Foc_native_pd interface (see in an earlier mail from Norman).
* The capability map slot is recreated by
Capability_map::insert(new_badge, old_kcap). Thus, I have to checkpoint
the kcap by Capability_map::find(new_badge)->kcap().

Now I am missing the pointer to target component's internal capability
map. I already have all dataspace capabilities which are attached to the
target's address space. With the pointer I can cast it to a
Capability_map* and use its methods to manipulate the Avl-tree. Please
correct me if I am wrong.

Norman, you proposed a rough idea of how to obtain a dataspace
capability of the capability map through the PD_session in one of your

Post by Norman Feske
2. We may let the child pro-actively propagate information about its
capability space to the outside so that the monitoring component can
conveniently intercept this information. E.g. as a rough idea, we
could add a 'Pd_session::cap_space_dataspace' RPC function where a
component can request a dataspace capability for a memory buffer
where it reports the layout information of its capability space.
This could happen internally in the base library. So it would be
transparent for the application code.

Can you or of course anyone else elaborate on how it "could happen
internally in the base library"? Does core know the locations of
capability maps of other components?

Kind regards,
Denis

PS: If my thoughts contain a mistake, please feel free to correct me. It
would help me a lot :)

Norman Feske

2016-10-21 10:15:04 UTC

Permalink

Hi Denis,

I created a simple figure [1] to illustrate my thoughts. [...]
[1]
https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b78f529818d01b42f0b35845e36e4e1d08b22eba/drawio_genode_capability_foc.png

the figure is good except for the detail that the capability map should
appear within the protection domain. It is a component-local data structure.

Exactly.

In the following I assume that the RPC objects of the target component
are created by the Checkpoint/Restore component (i.e. it intercepts the
session requests and provides own sessions at child creation). The other
case regarding local RPC objects of the target component will be
By virtualizing the session RPC objects and the normal RPC objects, I
can checkpoint the state of them. Thus, I can recreate an RPC object.

I do not completely understand what you mean by "virtualizing RPC
objects". To recap the terminology, an RPC object is a data structure
that is local to the component. When restoring the virtual address space
of the component, this data structure gets re-created automatically.
However the data structure contains the capability (it is an
'Object_pool::Entry') as a member. The "local name" aka "badge" of this
capability is used as a key to look up the invoked object of an incoming
RPC request. This capability originated from 'Pd_session::alloc_rpc_cap'.

When I do that the RPC object has a new capability (local to the
Checkpoint/Restore component) and a valid badge. Implicitly a valid IPC
gate is also recreated. Thus, the target component has to know this
capability inside its protection domain. Therefore, the capability
space/map slot has to point to the IPC gate or to the new badge,
respectively.
* The capability space slot is recreated by issuing l4_task_map to map a
capability from core to the target child. This is done by extending
Foc_native_pd interface (see in an earlier mail from Norman).
* The capability map slot is recreated by
Capability_map::insert(new_badge, old_kcap). Thus, I have to checkpoint
the kcap by Capability_map::find(new_badge)->kcap().

Yes. The problem is that the latter operation is a component-local
manipulation of its cap map data structure. The monitor cannot call the
function in the target's address space directly.

Now I am missing the pointer to target component's internal capability
map.
I already have all dataspace capabilities which are attached to the
target's address space. With the pointer I can cast it to a
Capability_map* and use its methods to manipulate the Avl-tree. Please
correct me if I am wrong.

This won't work that easily. The AVL tree contains pointers that point
to some place within the target's address space. A function call would
ultimately de-reference those pointers. If you attach (a part of) the
target's address space within the monitor's address space, the pointers
would generally not be valid in the monitor's address space. Aside from
that, I do not think that it would be a good idea to let the monitor
de-reference pointer values originating from the (untrusted) target.

The AVL tree must be manipulated without relying on the original code.
To sidestep this issue, I proposed to simplify the data structure, e.g.,
by replacing the AVL tree by a list. Then, the monitor just needs to
write new badge values into the targets memory but won't need to
manipulate the target's data structures. This applies to the AVL tree
used in the cap map and the AVL tree used by the object pool (which also
uses the badge as key).

Granted, by using a plain list, the lookup becomes slower. But you
remove a show stopper for your actual research goal. Once, the
checkpointing works, we can still try to solve the AVL tree problem.

Norman, you proposed a rough idea of how to obtain a dataspace
capability of the capability map through the PD_session in one of your

Can you or of course anyone else elaborate on how it "could happen
internally in the base library"? Does core know the locations of
capability maps of other components?

No. But my suggestion of the PD-session extension was not concerned with
capability maps at all. The proposed mechanism would only operate on the
target's capability space. The capability map must by adjusted by the
monitor by manipulating the target's memory. Both pieces of the puzzle
are needed: the population of the target's cap space (via an interface
provided by core), and the update of the badges in the target's cap map.

By "could happen internally in the base library", I meant that the
proactive "leaking" of interesting information (like the base address of
the cap map, or the association between kcap selectors and badges) from
the target to the monitor could be hidden in the base library (which is
locally linked to each component). Because it would not be visible at
the API level, it is transparent to component developers.

Cheers
Norman

Denis Huber

2016-10-22 11:18:39 UTC

Permalink

Hello Norman,

thank you for the confirmation of my thoughts and for the
correction/clarification of my misunderstandings.

Post by Norman Feske
This won't work that easily. The AVL tree contains pointers that point
to some place within the target's address space. A function call would
ultimately de-reference those pointers. If you attach (a part of) the
target's address space within the monitor's address space, the pointers
would generally not be valid in the monitor's address space. Aside from
that, I do not think that it would be a good idea to let the monitor
de-reference pointer values originating from the (untrusted) target.
The AVL tree must be manipulated without relying on the original code.
To sidestep this issue, I proposed to simplify the data structure, e.g.,
by replacing the AVL tree by a list. Then, the monitor just needs to
write new badge values into the targets memory but won't need to
manipulate the target's data structures. This applies to the AVL tree
used in the cap map and the AVL tree used by the object pool (which also
uses the badge as key).

Thank you for the hint with the pointers inside an AVL tree. I did not
thought my concept to the end and missed the fact, that the pointers are
only valid inside target's address space. Thus, I will simplify the AVL
tree for the cap map and the AVL tree of the object pool to lists. And
just change the values of the list elements. I will still need to use
the pointer of List::Element::_next, but I will have to convert the
pointer to point to a dataspace of the target.

Post by Norman Feske
No. But my suggestion of the PD-session extension was not concerned with
capability maps at all. The proposed mechanism would only operate on the
target's capability space. The capability map must by adjusted by the
monitor by manipulating the target's memory. Both pieces of the puzzle
are needed: the population of the target's cap space (via an interface
provided by core), and the update of the badges in the target's cap map.

Now I understand. You mean, I can propagate information from the target
component through its capability space. For a simple example, I could
create an (unbound) IPC gate, store a pointer into the label, and use a
capability space slot to reference it. I could take one from the area
controlled by core, which will (probably) not be overriden: I could take
the last one which is 0x1ff.

Post by Norman Feske
By "could happen internally in the base library", I meant that the
proactive "leaking" of interesting information (like the base address of
the cap map, or the association between kcap selectors and badges) from
the target to the monitor could be hidden in the base library (which is
locally linked to each component). Because it would not be visible at
the API level, it is transparent to component developers.

I could do the creation of the IPC gate and the assignment in the
startup code of the application. I found a file, where a function
initializes the main thread:

base_foc/src/lib/base/thread_bootstrap.cc:30

It is called prepare_init_main_thread. Is it save to use this function,
or is there a better one?

What do you think about my approach? Will it work theoretically with the
assumption that the last capability slot will not be used by any
Genode/Fiasco.OC library and by no future library/component.

Kind regards,
Denis

Norman Feske

2016-10-24 16:13:21 UTC

Permalink

Hi Denis,

Post by Denis Huber
Now I understand. You mean, I can propagate information from the target
component through its capability space. For a simple example, I could
create an (unbound) IPC gate, store a pointer into the label, and use a
capability space slot to reference it. I could take one from the area
controlled by core, which will (probably) not be overriden: I could take
the last one which is 0x1ff.

I had a pretty simple idea in mind: The PD session could offer an RPC
function like this:

Dataspace_capability cap_space_info();

A component can call this function to obtain a RAM dataspace of a
predefined size from the PD service.

The component startup code calls this function. If it returns a valid
dataspace, it attaches the dataspace to its address space. So now, we
have a shared memory block that can be written to by the target and
inspected by the monitor (because it is the PD service that handed out
the 'cap_space_info' dataspace).

Once the shared memory is in place, the target can record information
that might be of interest for the monitor to this dataspace. E.g., it
may maintain an array of the following struct:

struct Cap_info
{
unsigned kcap;
addr_t badge_ptr; /* pointer to badge value within cap map */
};

Thereby it tells the monitor all the information that is needed to
extract/update the information of the cap map.

This array could be updated initially (when the dataspace is mapped, the
cap map already contains a bunch of capabilities), and whenever a
capability is inserted/removed into/from the cap map.

Note that since both the target and monitor access the cap_space_info
concurrently, you'd need to add some kind of synchronization between
both, e.g., by maintaining some bits within the cap-space-info dataspace
that act as some kind of spinlock.

Core's version of 'Pd_session::cap_space_info' would return an invalid
capability. In this case, the component skips the reporting.

Post by Denis Huber
I could do the creation of the IPC gate and the assignment in the
startup code of the application. I found a file, where a function
base_foc/src/lib/base/thread_bootstrap.cc:30
It is called prepare_init_main_thread. Is it save to use this function,
or is there a better one?

Its good as it is specific to Fiasco.OC.

Post by Denis Huber
What do you think about my approach? Will it work theoretically with the
assumption that the last capability slot will not be used by any
Genode/Fiasco.OC library and by no future library/component.

I think that your approach could work well but it adds one IPC call for
each cap-map operation. The shared-memory idea as drafted above does not
impose this overhead. That said, I would suggest to implement whichever
way you find easier. ;-)

Cheers
Norman