Skip to content
Snippets Groups Projects
containerize-openstack.rst 9 KiB
Newer Older
  • Learn to ignore specific revisions
  • ..
       This work is licensed under a Creative Commons Attribution 3.0 Unported
     License.
    
     http://creativecommons.org/licenses/by/3.0/legalcode
    
    ======================
    Containerize OpenStack
    ======================
    
    When upgrading or downgrading OpenStack, it is possible to use package based
    
    caoyuan's avatar
    caoyuan committed
    management or image-based management. Containerizing OpenStack is meant to
    optimize image-based management of OpenStack. Containerizing OpenStack
    
    solves a manageability and availability problem with the current state of the
    art deployment systems in OpenStack.
    
    Problem description
    ===================
    
    Current state of the art deployment systems use either image based or package
    based upgrade.
    
    
    caoyuan's avatar
    caoyuan committed
    Image based upgrades are utilized by TripleO. When TripleO updates a system,
    
    it creates an image of the entire disk and deploys that rather than just the
    
    caoyuan's avatar
    caoyuan committed
    parts that compose the OpenStack deployment. This results in significant
    loss of availability. Further running VMs are shut down in the imaging
    process. However, image based systems offer atomicity, because all related
    
    software for a service is updated in one atomic action by reimaging the system.
    
    
    caoyuan's avatar
    caoyuan committed
    Other systems use package based upgrade. Package based upgrades suffer from
    a non-atomic nature. An update may update 1 or more RPM packages. The update
    
    process could fail for any number of reasons, and there is no way to back
    
    caoyuan's avatar
    caoyuan committed
    out the existing changes. Typically in an OpenStack deployment it is
    
    desirable to update a service that does one thing including it's dependencies
    
    caoyuan's avatar
    caoyuan committed
    as an atomic unit. Package based upgrades do not offer atomicity.
    
    
    To solve this problem, containers can be used to provide an image-based update
    approach which offers atomic upgrade of a running system with minimal
    
    caoyuan's avatar
    caoyuan committed
    interruption in service. A rough prototype of compute upgrade [1] shows
    
    approximately a 10 second window of unavailability during a software update.
    The prototype keeps virtual machines running without interruption.
    
    Use cases
    ---------
    
    caoyuan's avatar
    caoyuan committed
    1. Upgrade or rollback OpenStack deployments atomically. End-user wants to
    
       change the running software versions in her system to deploy a new upstream
       release without interrupting service for significant periods.
    
    caoyuan's avatar
    caoyuan committed
    2. Upgrade OpenStack based by component. End-user wants to upgrade her system
    
       in fine-grained chunks to limit damage from a failed upgrade.
    
    caoyuan's avatar
    caoyuan committed
    3. Rollback OpenStack based by component. End-user experienced a failed
    
       upgrade and wishes to rollback to the last known good working version.
    
    
    Proposed change
    ===============
    An OpenStack deployment based on containers are represented in a tree structure
    with each node representing a container set, and each leaf representing a
    container.
    
    The full properties of a container set:
    
    * A container set is composed of one or more container subsets or one or more
      individual containers
    * A container set provides a single logical service
    * A container set is managed as a unit during startup, shutdown, and version
    * Each container set is launched together as one unit
    * A container set with subsets is launched as one unit including all subsets
    * A container set is not atomically managed
    * A container set provides appropriate hooks for high availability monitoring
    
    The full properties of a container:
    
    * A container is atomically upgraded or rolled back
    * A container includes a monotonically increasing generation number to identify
      the container's age in comparison with other containers
    * A container has a single responsibility
    * A container may be super-privileged when it needs significant access to the
      host including:
    
      * the network namespace of the host
      * The UUID namespace of the host
      * The IPC namespace of the host
      * Filesystem sharing of the host for persistent storage
    
    * A container may lack any privileges when it does not require significant
      access to the host.
    * A container should include a check function for evaluating its own health.
    * A container will include proper PID 1 handling for reaping exited child
      processes.
    
    The top level container sets are composed of:
    
    * database control
    * messaging control
    * high availability control
    
    * OpenStack interface
    
    * OpenStack control
    
    * OpenStack compute operation
    * OpenStack network operation
    
    * OpenStack storage operation
    
    The various container sets are composed in more detail as follows:
    
    * Database control
    
      * galera
      * mariadb
      * mongodb
    
    * Messaging control
    
      * rabbitmq
    
    * High availability control
    
      * keepalived
    
    * OpenStack interface
    
      * glance-api
      * nova-api
      * ceilometer-api
      * heat-api
    
    * OpenStack control
    
    
      * glance-controller
    
        * glance-registry
    
      * nova-controller
    
        * nova-conductor
        * nova-scheduler
    
        * metadata-service
    
      * cinder-controller
    
      * neutron-controller
    
        * neutron-server
    
      * ceilometer-controller
    
        * ceilometer-alarm
        * ceilometer-base
        * ceilometer-central
        * ceilometer-collector
        * ceilometer-notification
    
      * heat-controller
    
    * OpenStack compute operation
    
      * nova-compute
      * nova-libvirt
      * neutron-agents-linux-bridge
      * neutron-agents-ovs
    
    
    * OpenStack network operation
    
      * dhcp-agent
      * l3-agent
      * metadata-agent
      * lbaas-agent
      * fwaas-agent
    
    
    * OpenStack storage operation
    
        * swift-account
        * swift-base
        * swift-container
        * swift-object
        * swift-proxy-server
    
    In order to achieve the desired results, we plan to permit super-privileged
    
    caoyuan's avatar
    caoyuan committed
    containers. A super-privileged container is defined as any container launched
    
    with the --privileged=true flag to docker that:
    
    * bind-mounts specific security-crucial host operating system directories
    
    caoyuan's avatar
    caoyuan committed
      with -v. This includes nearly all directories in the filesystem except for
    
      leaf directories with no other host operating system use.
    
    * shares any namespace with the --ipc=host, --pid=host, or --net=host flags
    
    
    We will not use the Docker EXPOSE operation since all containers will use
    
    caoyuan's avatar
    caoyuan committed
    --net=host. One motive for using --net=host is it is inherently simpler.
    
    A different motive for not using EXPOSE is the 20 microsecond penalty
    applied to every packet forwarded and returned by docker-proxy.
    If EXPOSE functionality is desired, it can be added back by
    referencing the default list of OpenStack ports to each Dockerfile:
    
    `Firewalls and default ports <https://docs.openstack.org/install-guide/firewalls-default-ports.html>`__.
    
    We will use the docker flag --restart=always to provide some measure of
    high availability for the individual containers and ensure they operate
    correctly as currently designed.
    
    A host tool will run and monitor the container's built-in check script via
    docker exec to validate the container is operational on a pre-configured timer.
    If the container does not pass its healthcheck operation, it should be
    restarted.
    
    Integration of metadata with fig or a similar single node Docker orchestration
    
    caoyuan's avatar
    caoyuan committed
    tool will be implemented. Even though fig executes on a single node, the
    
    containers will be designed to run multi-node and the deploy tool should take
    
    caoyuan's avatar
    caoyuan committed
    some form of information to allow it to operate multi-node. The deploy tool
    
    should take a set of key/value pairs as inputs and convert them into inputs
    
    caoyuan's avatar
    caoyuan committed
    into the environment passed to Docker. These key/value pairs could be a file
    or environment variables. We will not offer integration with multi-node
    
    scheduling or orchestration tools, but instead expect our consumers to manage
    each bare metal machine using our fig or similar in nature tool integration.
    
    Any contributions from the community of the required metadata to run these
    containers using a multi-node orchestration tool will be warmly received but
    generally won't be maintained by the core team.
    
    
    caoyuan's avatar
    caoyuan committed
    The technique for launching the deploy script is not handled by Kolla. This
    
    is a problem for a higher level deployment tool such as TripleO or Fuel to
    tackle.
    
    Logs from the individual containers will be retrievable in some consistent way.
    
    Security impact
    ---------------
    
    
    caoyuan's avatar
    caoyuan committed
    Container usage with super-privileged mode may possibly impact security. For
    
    example, when using --net=host mode and bind-mounting /run which is necessary
    for a compute node, it is possible that a compute breakout could corrupt the
    host operating system.
    
    To mitigate security concerns, solutions such as SELinux and AppArmor should
    be used where appropriate to contain the security privileges of the containers.
    
    Performance Impact
    ------------------
    
    
    The upgrade or downgrade process changes from a multi-hour outage to a 10
    
    second outage across the system.
    
    Implementation
    ==============
    
    
    Assignee(s)
    -----------
    
    Primary assignee:
    
    kolla maintainers
    
    Work Items
    ----------
    
    1. Container Sets
    2. Containers
    3. A minimal proof of concept single-node fig deployment integration
    4. A minimal proof of concept fig healthchecking integration
    
    Testing
    =======
    
    Functional tests will be implemented in the OpenStack check/gating system to
    automatically check that containers pass each container's functional tests
    stored in the project's repositories.
    
    Documentation Impact
    ====================
    
    The documentation impact is unclear as this project is a proof of concept
    with no clear delivery consumer.
    
    
    References
    ==========
    
    * [1] https://github.com/sdake/compute-upgrade