Use a style file as recommended by upstream. This makes for only one
source of truth.
Conserve previous upstream default for MD007 (upstream default changed
here https://github.com/markdownlint/markdownlint/pull/373)
Use gitlab dynamic child pipelines feature to have one source of truth
for the pre-commit jobs, the pre-commit config file.
Use one cache per pre-commit. This should reduce the "fetching cache"
time steps in gitlab-ci, since each job will have a separate cache with
only its hook installed.
We have inconsistent sets of options passed to the playbooks during our
CI runs.
Don't run ansible-playbook directly, instead factorize the execution in
a bash function using all the common flags.
Also remove various ENABLE_* variables and instead directly test for the
relevant conditions at execution time, as this makes it more obvious and
does not force one to go back and forth in the script.
With CentOS, kubespray currently produces the following warning:
[WARNING]: TASK: bootstrap-os : Enable Oracle Linux repo: The loop variable
'item' is already in use. You should set the `loop_var` value in the
`loop_control` option for the task to something else to avoid variable
collisions and unexpected behavior.
This could bites us in nasty ways, so fix it.
c58497cde (Refactor bootstrap-os (#10983), 2024-03-27) refactored the
boostrap-os include but didn't adapt the amazon linux tasks to the
actual ID of amazon linux ('amzn')
Re-enable the CI so we can avoid that kind of breakage.
if node.projectcalico.org already existe patch node to set asNumber
instead of apply resource to prevent remove of existing fields feed by
calico-node pods
✅ Closes: 11096
When using the kube_node_instancers_with_disks* variables, there were
no configuration block using those to provision disks with the
VirtualBox provider.
This commit fixes it.
Some packages requirements depends on inventory variables
(`kube_proxy_mode` in that case but it could apply to others).
As the case seems pretty rare, instead of adding complexity to pkgs, we
add an escape hatch to use jinja conditions.
That should be revisited if we find ourselves shoehorning lots of logic
in this later on.
Uses the logic introduced in the previous patch to convert all
kubernetes/preinstall/vars/* os specific files to the `pkgs`
dictionary.
Some niceties for devs:
- always validate the `pkgs` variable to catch mistakes in CI.
- ensure that `pkgs` is always sorted. This makes it easier to find the
packages you're looking for.
Adds infrastructure to install OS packages depending not only on OS
(family, versions, etc) but on groups.
All the informations related to a particular package should reside in
the `pkgs` dictionnary, which takes inspiration from the `downloads`
dictionary structure.
Since the structure we're setting in place for installing packages has
some complexity, add a JSON schema to avoid frustrating errors when
modifying the informations (adding/removing packages install).
This reverts commit 4b0a134bc9.
The mentionned PR break scale.yml. This goes back to the status quo
until a proper fix can be provided, at which point we'll reapply the
PR.
Upgrade Snapshot controller installed for all supported Kubernetes
versions to v7.0.2. Also update the manifests used to deploy the
Snapshot controller.
* feat: add user facing variable with default
* feat: remove rolebinding to anonymous users after init and upgrade
* feat: use file discovery for secondary control plane nodes
* feat: use file discovery for nodes
* fix: do not fail if rolebinding does not exist
* docs: add warning about kube_api_anonymous_auth
* style: improve readability of delegate_to parameter
* refactor: rename discovery kubeconfig file
* test: enable new variable in hardening and upgrade test cases
* docs: add option to config parameters
* test: multiple instances and upgrade
* Move fedora ansible python install to bootstrap-os
* /bin/dir is set in bootstrap-os
* Removing ansible_os_family workarounds
Support for these distributions was merged in Ansible, no need to
override it ourselves now.
https://github.com/ansible/ansible/pull/69324 openEuler
https://github.com/ansible/ansible/pull/77275/ UnionTech OS Server 20
https://github.com/ansible/ansible/pull/78232/ Kylin
* Don't unconditionnaly set VARIANT_ID=coreos in os-release
WTF, this is so wrong.
Furthermore, is_fedora_coreos is already handled in boostrap-os
* Handle Clearlinux generically
Followup of 4eec302e86 (since we're using
package module anyway, let's get rid of the custom task)
* Remove leftover files for Coreos
Coreos was replaced by flatcar in 058438a25 but the file was copied
instead of moved.
* Remove workarounds for resolved ansible issues
* boostrap: Use first_found to include per distro
Using directly ID and VARIANT_ID with first_found allow for less manual
includes.
Distro "families" are simply handled by symlinks.
* boostrap: don't set ansible_python_interpreter
- Allows users to override the chosen python_interpreter with group_vars
easily (group_vars have lesser precedence than facts)
- Allows us to use vars at the task scope to use a virtual env
Ansible python discovery has improved, so those workarounds should not
be necessary anymore.
Special workaround for Flatcar, due to upstream ansible not willing to
support it.
* upgrade ansible version
Needed for with_first_found to work correctly:
https://github.com/ansible/ansible/issues/70772 fixed in 2.16
* Remove unused google cloud cloud_playbook
* Fix dpkg_selection on non-existing packages
Needed since ansible-core>2.16, see:
f10d11bcdc
* scripts: ignore download_hash download failures
Binary names on github releases often change and this script might break
because of that, this commit allow to ignore these failures as a mean to
be able to run the script anyway.
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* scripts: use sha256sums for crio as well
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* scripts: add ppc64le support for crio
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
---------
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
The new version brings the following improvements:
- remove having to resort to python python to limit tags (it it slower than
the sh equivalent as python has a somewhat significant startup time).
- Introduce a concept of min version so that it can only get Kubernetes
version supported by Kubespray.
- Fix an issue with kata changing their file scheme (the arch
specifically)
- Now download sha256/sha256sum files if provided rather than
downloading the full file and computing the hash
- A few minor style tweaks
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr.fr>
Fixes bug for retrieving images with tags containing image digests.
Script now gets images from jobs and cronjobs as well.
New env variable DESTINATION_REGISTRY to push to another registry
instead of local registry.
New env variable IMAGES_FROM_FILE to pull images listed in a file
instead of getting images from a running k8s environment.
New env variable REGISTRY_PORT to override port (default is 5000).
Under the original code, leader election failed for ingress controllers
as a result of mismatch between election-id in the controller config,
and the resourceName in the relevant rule of role 'ingress-nginx'.
This appeared in the controller logs.
To fix the issue, a command-line option was added to container
execution (--election-id=...).
Now, the election-id agrees with the resourceName provided in
the role-ingress-nginx.yml file. A comment in that file was
changed to reflect the new logic.
Co-authored-by: Vasilis Samoladas <vsam@softnet.tuc.gr>
Co-authored-by: Mohamed Omar Zaian <mohamedzaian@gmail.com>
This should avoid permissions problems when the user creating the
directory and the user creating the content are different (when
containers images are saved by root for instances, because the user
can't use the container runtime).
* Refactor of kubeadm images listing
Instead of setting multiples facts, we directly create the dict we need from
kubeadm output.
* Remove useless 'default' filters in roles/download
* Only download kubeadm images where needed
The current state waiting method is bad to implement.
When changing the deployment version, which is execute with the upgrade_cluster in the previous ansible task: "Kubernetes Apps | Install and configure MetalLB", next ansible task: "Kubernetes Apps | Wait for MetalLB controller to be running" may fall with an error.
* [kubernetes] Make kubernetes 1.29.1 default
* [cri-o]: support cri-o 1.29
Use "crio status" instead of "crio-status" for cri-o >=1.29.0
* Remove GAed feature gates SecCompDefault
The SecCompDefault feature gate was removed since k8s 1.29
https://github.com/kubernetes/kubernetes/pull/121246
* Update upgrades.md
modify env serial to have real rolling upgrades
* Update upgrades.md
change section for serial
* Update docs/upgrades.md
Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com>
---------
Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com>
Handle all old dns domains:
- for nodelocaldns: in the same server block as the current dns_domain
- for coredns: uffix rewrite of each of the old dns domains to the
current one
The old version of the script downloaded all binaries and generated file checksums locally.
This was a slow process since all binaries of all architectures needed to be downloaded.
The new version simply downloads the .sha256 files containing the binary checksum in text
form which saves a lot of traffic and time.
* Enable configuring mountOptions, reclaimPolicy and volumeBindingMode for cinder-csi StorageClasses
* Check if class.mount_options is defined at all, before generating the option list
In the case were vagrant is not invoked directly from the repository,
but from another location, and the Vagrantfile is "included" into
another, we need to be able to specify where the location of the vagrant
directory is, as of now it's hardcoded relative to the Vagrantfile
location. This commit fix it.
* ignore_unreachable for etcd dir cleanup
ignore_errors ignores errors occur within "file" module. However, when
the target node is offline, the playbook will still fail at this task
with node "unreachable" state. Setting "ignore_unreachable: true" allows
the playbook to bypass offline nodes and move on to proceed recovery
tasks on remaining online nodes.
* Re-arrange control plane recovery runbook steps
* Remove suggestion to manually update IP addresses
The suggestion was added in 48a182844c 4
years ago. But a new task added 2 years ago, in
ee0f1e9d58, automatically update API
server arg with updated etcd node ip addresses. This suggestion is no
longer needed.
* ci: redefine multinode to node-etcd-client
This should allow to catch several class of problem rather than just
one -> from network plugin such as calico or cilium talking directly to
the etcd.
* Dynamically define etcd host range
This has two benefits:
- We don't play the etcd role twice for no reason
- We have access to the whole cluster (if needed) to use things like
group_by.
* Convert the bug-report template to issue form
* Convert the enchancement issue template to form
* Convert "Failing Test" template to issue form
* github: Remove support request template, direct to slack instead
* Remove checks for docs using exact tags
Instead use a more generic documentation for installing kubespray as a
collection from git.
* Check that we upgraded galaxy.yml to next version
This is only intented to check for human error. The version in galaxy
should be the next (which does not mean the same if we're on master or a
release branch).
* Set collection version to KUBESPRAY_NEXT_VERSION
* update cilium configmap template for new routing mode and tunnel-protocol options
Ryan Lonergan ryan.tlonergan@gmail.com
* add rbac for new cilium crd in 1.14
Ryan Lonergan ryan.tlonergan@gmail.com
* add conditional for cni-install.sh that's no longer included in cilium 1.14
Ryan Lonergan ryan.tlonergan@gmail.com
* Update roles/network_plugin/cilium/templates/cilium/ds.yml.j2
Co-authored-by: Cyclinder <qifeng.guo@daocloud.io>
---------
Co-authored-by: Cyclinder <qifeng.guo@daocloud.io>
* Disable control plane allocating podCIDR for nodes when using calico
Calico does not use the .spec.podCIDR field for its IP address
management.
Furthermore, it can false positives from the kube controller manager if
kube_network_node_prefix and calico_pool_blocksize are unaligned, which
is the case with the default shipped by kubespray.
If the subnets obtained from using kube_network_node_prefix are bigger,
this would result at some point in the control plane thinking it does
not have subnets left for a new node, while calico will work without
problems.
Explicitely set a default value of false for calico_ipam_host_local to
facilitate its use in templates.
* Don't default to kube_network_node_prefix for calico_pool_blocksize
They have different semantics: kube_network_node_prefix is intended to
be the size of the subnet for all pods on a node, while there can be
more than on calico block of the specified size (they are allocated on
demand).
Besides, this commit does not actually change anything, because the
current code is buggy: we don't ever default to
kube_network_node_prefix, since the variable is defined in the role
defaults.
We take advantage of group_by to create the list of nodes needing new
certs, instead of manually looping inside a Jinja template.
This should make the role more readable and less susceptible to
white space problems.
* Decouple role kubespray-defaults from download
Avoids doing re-importing the download role on every invocation of
kubespray-defaults (and skipping everything).
This has a measurable effect on playbook performance.
* Update docs refering to moved download defaults
* Mask systemd swap.target do disable swap
This is a more generic way to disable swap, since it pulls .swap units
in systemd distributions; fstab is only one way to generate .swap units.
* Unconditionally disable swap
We only care to disable it (the "swapon" registered variable is not used
anywhere else.
This allows to get rid of the ignore_errors, since this was added
because swapon.stdout does not exist in check_mode (see issue #6642).
* Don't explicitly disable swapOnZram
We're already masking the swap.target, which would pull the zram unit,
hence no need to handle zram-generator specifically.
Allow to fail early (pre-commit time) for jinja error, rather than
waiting until executing the playbook and the invalid template.
I could not find a simple jinja pre-commit hook in the wild.
* Clean up redondant defaulting
drain_{timeout,grace_period}_after_failure don't exist at this point, so
they always default.
* Remove useless facts
The drain_*_after_failure are never used
* Try both conntrack modules instead of checking kernel version
Depending on kernel distributor, the kernel version might not be a
correct indicator of the conntrack module use.
Instead, we check both (and use the first found).
* Use modproble.persistent rather than manual persistence
When installed as an ansible collection, roles in
ansible_play_role_names will be designated by their FQDN (i.e
'kubernetes-sigs.kubespray.<role-name>).
It means we need to check for both when checking for roles in the play.
* Validate systemd unit files
This ensure that we fail early if we have a bad systemd unit file
(syntax error, using a version not available in the local version, etc)
* Hack to check systemd version for service files validation
factory-reset.target was introduced in system 250, same version as the
aliasing feature we need for verifying systemd services with ansible.
So we only actually executes the validation if that target is present.
This is an horrible hack which should be reverted as soon as we drop
support for distributions with systemd<250.
* ansible: upgrade to version >= 2.15.5
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: update requirements
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* contrib/openstack: fix wrong gitignore pattern
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: add missing tzdata requirement
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* tests: remove some molecules tests
Those doesn't work in Ansible 2.15. Ansible can't load builtin now
apparently and these tests are not worth it.
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
---------
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
Sets ignore_unreachable: true to `Gather ansible_default_ipv4 from all hosts`
task from fallback_ips.yml
Without this scale.yml will fail if a single node in the cluster is down, which
for large clusters happens often.
Remove cri-o apt repo job has state present but need absent
Uninstall CRI-O packages job has undefined variable crio_packages
replaced by list of packages
* metallb --lb-class cmd arg to support multiple load balancer implementations
* removed loadbalancer_class from metallb_config; metallb_loadbalancer_class in role defaults
* Use RandomizedDelaySec to spread out control certificates renewal plane
If the number of control plane node is superior to 6, using (index * 10
minutes) will fail (03:60:00 is not a valid timestamp).
Compared to just fixing the jinja expression (to use a modulo for
example), this should avoid having two control planes certificates
update node being triggered at the same time.
* Make k8s-certs-renew.timer Persistent
If the control plane happens to be offline during the scheduled
certificates renewal (node failure or anything like that), we still want
the renewal to happen.
* containerd: refactor handlers to use 'listen'
* cri-dockerd: refactor handlers to use 'listen'
* cri-o: refactor handlers to use 'listen'
* docker: refactor handlers to use 'listen'
* etcd: refactor handlers to use 'listen'
* control-plane: refactor handlers to use 'listen'
* kubeadm: refactor handlers to use 'listen'
* node: refactor handlers to use 'listen'
* preinstall: refactor handlers to use 'listen'
* calico: refactor handlers to use 'listen'
* kube-router: refactor handlers to use 'listen'
* macvlan: refactor handlers to use 'listen'
It was not 'false', which made some tasks (e.g. using systemd-resolved
template) to effectively remove default search domains; caused DNS loop
after rebooting the node/restarting cluster, so localdns service didn't
run correctly.
This make native ansible features (dry-run, changed state) easier to
have, and should have a minimal performance impact, since it only runs
on the etcd members.
* Specify the runc path when we use the containerd container engine
and change the bin_dir path.
Signed-off-by: Jin Li <qlijin@gmail.com>
* Update roles/container-engine/containerd/templates/config.toml.j2
Co-authored-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
---------
Signed-off-by: Jin Li <qlijin@gmail.com>
Co-authored-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
The blockSize attribute from Calico IPPool resources cannot be changed
once set [1]. Consequently, we use the one currently defined when
configuring the existing IPPool, avoiding upgrade errors by trying to
change it.
In particular, this can be useful when calico_pool_blocksize default
changes in kubespray, which would otherwise force users to add an
explicit setting to their inventories.
[1]: https://docs.tigera.io/calico/latest/reference/resources/ippool#spec
* modify variables.tf to accept AMI attributes via variables
* update README to guide users on utilizing variable-driven AMI configuration
* fix markdown lint error
This allows this task to work with a forks count > 10 and the default
configuration of sshd, which is to limit sessions to 10. (see
MaxSessions in sshd_config).
Since this is a delegate_to task, it connects to the same host (first
etcd) for each node in the cluster, thus easily going above 10.
Raising the ssh connection attempts allow for more robustness, without
decreasing the forks count or serialising the tasks, which could slow
the task (or the playbook as a whole, if decreasing forks).
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
Refactor NRI (Node Resource Interface) activation in CRI-O and
containerd. Introduce a shared variable, nri_enabled, to streamline
the process. Currently, enabling NRI requires a separate update of
defaults for each container runtime independently, without any
verification of NRI support for the specific version of containerd
or CRI-O in use.
With this commit, the previous approach is replaced. Now, a single
variable, nri_enabled, handles this functionality. Also, this commit
separates the responsibility of verifying NRI supported versions of
containerd and CRI-O from cluster administrators, and leaves it to
Ansible.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
* [containerd] Add Configuration option for Node Resource Interface
Node Resource Interface (NRI) is a common is a common framework for
plugging domain or vendor-specific custom logic into container
runtime like containerd. With this commit, we introduce the
containerd_disable_nri configuration flag, providing cluster
administrators the flexibility to opt in or out (defaulted to 'out')
of this feature in containerd. In line with containerd's default
configuration, NRI is disabled by default in this containerd role
defaults.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
* [cri-o] Add configuration option for Node Resource Interface
Node Resource Interface (NRI) is a common is a common framework for
plugging domain or vendor-specific custom logic into container
runtimes like containerd/crio. With this commit, we introduce the
crio_enable_nri configuration flag, providing cluster
administrators the flexibility to opt in or out (defaulted to 'out')
of this feature in cri-o runtime. In line with crio's default
configuration, NRI is disabled by default in this cri-o role
defaults.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
---------
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
* terraform-openstack: Updated extra partitions to use empty list by default
* terraform-openstack: Added possibility to enable dhcp flag critical on one interface
when people run playbook with option `--tags=kubelet`, the kubelet config may changed, because some variables used in task populating `kubelet-config.yml` could be different with running task(`Fetch facts`)
* Fix containerd_registries in config_path for mirrors and remove nerdctl global insecure_registry setting
* Make containerd hosts.toml mode 0640
* Add containerd_registries_mirrors and keep containerd_registries to pass packet_debian11-calico-upgrade
Set owner/group to root/root when unarchiving kata-containers binary to prevent kata-containers binaries/directories and especially / from getting chowned to 1001:123, the file owner specified in the kata-containers archive
* tests: replace fedora35 with fedora37
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* tests: replace fedora36 with fedora38
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* docs: update fedora version in docs
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* molecule: upgrade fedora version
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* tests: upgrade fedora images for vagrant and kubevirt
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* vagrant: workaround to fix private network ip address in fedora
Fedora stop supporting syconfig network script so we added a workaround
here
https://github.com/hashicorp/vagrant/issues/12762#issuecomment-1535957837
to fix it.
* netowrkmanager: do not configure dns if using systemd-resolved
We should not configure dns if we point to systemd-resolved.
Systemd-resolved is using NetworkManager to infer the upstream DNS
server so if we set NetworkManager to 127.0.0.53 it will prevent
systemd-resolved to get the correct network DNS server.
Thus if we are in this case we just don't set this setting.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* image-builder: update centos7 image
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* gitlab-ci: mark fedora packet jobs as allow failure
Fedora networking is still broken on Packet, let's mark it as allow
failure for now.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
---------
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* Update reset.yml
reset confirmation user input fix
* Update reset.yml
added default for non-interactive run in ci/cd
* fix reset_confirmation in reset.yml
* skip reset_confirmation promtp when reset_confirmation is defined via extra-vars option (for tests)
* check both string type and object type with user_input for reset_confirmation var
* reset_confirmation_prompt in conjunction with reset_confirmation
improvement inspired by:
https://github.com/kubernetes-sigs/kubespray/pull/10288#issuecomment-1637056880
* project: update all dependencies including ansible
Upgrade to ansible 7.x and ansible-core 2.14.x. There seems to be issue
with ansible 8/ansible-core 2.15 so we remain on those versions for now.
It's quite a big bump already anyway.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* tests: install aws galaxy collection
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* ansible-lint: disable various rules after ansible upgrade
Temporarily disable a bunch of linting action following ansible upgrade.
Those should be taken care of separately.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: resolve deprecated-module ansible-lint error
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: resolve no-free-form ansible-lint error
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: resolve schema[meta] ansible-lint error
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: resolve schema[playbook] ansible-lint error
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: resolve schema[tasks] ansible-lint error
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: resolve risky-file-permissions ansible-lint error
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: resolve risky-shell-pipe ansible-lint error
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: remove deprecated warn args
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: use fqcn for non builtin tasks
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: resolve syntax-check[missing-file] for contrib playbook
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* project: use arithmetic inside jinja to fix ansible 6 upgrade
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
---------
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* tests: cleanup stale packet namespace automatically
Cancelled job on Gitlab can produce stale VMs as the delete playbook
will never be executed. This commits allow removing old vms by getting
all the namespace created from the same branch with an older pipeline
id.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* tests: cleanup stale packet namespace after 2 hours
This ensure that we don't have any packet namespace remaining for more
than 2 hours. All the jobs complete usually within 30min-1hour so 2
hours is enough to detect a stale namespace.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* tests: ignore vm cleanup failure
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* tests: use pipeline_id var instead of fetching namespace for cleanup packet vm
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
---------
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
New line was not inserted between image and imagePullPolicy for some
reasons with the jinja. Simplifying this altogether should fix this.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* Fix upgrade-path for kubelet-csr-approver
Fixes an error when you enable kubelet-csr-approver when upgrading.
It hangs waiting for the certificate to be approved since the
kubelet-csr-approver is not installed yet.
* Add missing package when using helm role
Molecule 5.0 require ansible-core 2.12.10.
So this commit we update ansible-core from 2.12.5 to 2.12.10.
We also drop supporting two ansible-core version. Also we now use the "oldest"
still supported ansible-core version as both 2.11 is EOL and not
supported by molecule.
tests/molecule: remove linting in molecule to support molecule 5
tests/molecule: remove role name check for molecule 5 support
Kubespray doesn't use ansible galaxy style naming so we have to disable
that check.
contrib/inventory_builder: fix tox.ini for tox4
tests/molecule: fix get_playbook in testinfra tests
tests: upgrade most tests requirements
Exclude ansible-lint for now, I will do that in a separate PR.
tests/molecule: force kvm driver option
If we don't do this it fallbacks to qemu emulated on our CI for some
reasons.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
* Fix `The task includes an option with an undefined variable` for 1.27
* delete old flag --container-runtime
Signed-off-by: Victor Login <batazor@evrone.com>
---------
Signed-off-by: Victor Login <batazor@evrone.com>
Add option to configure class as the default class
Add option to disable wathcing for ingresses without class
Remove redundant if that always evaluates to true
Fix default value missing for ingress_nginx_default
2023-05-24 04:12:50 -07:00
820 changed files with 13976 additions and 9212 deletions
about: Support request or question relating to Kubespray
labels: kind/support
---
<!--
STOP -- PLEASE READ!
GitHub is not the right place for support requests.
If you're looking for help, check [Stack Overflow](https://stackoverflow.com/questions/tagged/kubespray) and the [troubleshooting guide](https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/).
You can also post your question on the [Kubernetes Slack](http://slack.k8s.io/) or the [Discuss Kubernetes](https://discuss.kubernetes.io/) forum.
If the matter is security related, please disclose it privately via https://kubernetes.io/security/.
If you have questions, check the documentation at [kubespray.io](https://kubespray.io) and join us on the [kubernetes slack](https://kubernetes.slack.com), channel **\#kubespray**.
You can get your invite [here](http://slack.k8s.io/)
- Can be deployed on **[AWS](docs/aws.md), GCE, [Azure](docs/azure.md), [OpenStack](docs/openstack.md), [vSphere](docs/vsphere.md), [Equinix Metal](docs/equinix-metal.md) (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal**
- Can be deployed on **[AWS](docs/cloud_providers/aws.md), GCE, [Azure](docs/cloud_providers/azure.md), [OpenStack](docs/cloud_providers/openstack.md), [vSphere](docs/cloud_providers/vsphere.md), [Equinix Metal](docs/cloud_providers/equinix-metal.md) (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal**
- **Highly available** cluster
- **Composable** (Choice of the network plugin for instance)
- Supports most popular **Linux distributions**
@ -19,7 +19,7 @@ Below are several ways to use Kubespray to deploy a Kubernetes cluster.
#### Usage
Install Ansible according to [Ansible installation guide](/docs/ansible.md#installing-ansible)
Install Ansible according to [Ansible installation guide](/docs/ansible/ansible.md#installing-ansible)
See [here](docs/ansible_collection.md) if you wish to use this repository as an Ansible collection
See [here](docs/ansible/ansible_collection.md) if you wish to use this repository as an Ansible collection
### Vagrant
@ -99,7 +99,7 @@ python -V && pip -V
If this returns the version of the software, you're good to go. If not, download and install Python from here <https://www.python.org/downloads/source/>
Install Ansible according to [Ansible installation guide](/docs/ansible.md#installing-ansible)
Install Ansible according to [Ansible installation guide](/docs/ansible/ansible.md#installing-ansible)
then run the following step:
```ShellSession
@ -109,80 +109,79 @@ vagrant up
## Documents
- [Requirements](#requirements)
- [Kubespray vs ...](docs/comparisons.md)
- [Getting started](docs/getting-started.md)
- [Setting up your first cluster](docs/setting-up-your-first-cluster.md)
- [Ansible inventory and tags](docs/ansible.md)
- [Integration with existing ansible repo](docs/integration.md)
- [Deployment data variables](docs/vars.md)
- [DNS stack](docs/dns-stack.md)
- [HA mode](docs/ha-mode.md)
- [Kubespray vs ...](docs/getting_started/comparisons.md)
- Supported Docker versions are 18.09, 19.03 and 20.10. The *recommended* Docker version is 20.10. `Kubelet` might break on docker's non-standard version numbering (it no longer uses semantic versioning). To ensure auto-updates don't break your cluster look into e.g. the YUM ``versionlock`` plugin or ``apt pin``).
- The cri-o version should be aligned with the respective kubernetes version (i.e. kube_version=1.20.x, crio_version=1.20)
## Requirements
- **Minimum required version of Kubernetes is v1.24**
- **Ansible v2.11+, Jinja 2.11+ and python-netaddr is installed on the machine that will run Ansible commands**
- The target servers must have **access to the Internet** in order to pull docker images. Otherwise, additional configuration is required (See [Offline Environment](docs/offline-environment.md))
- **Minimum required version of Kubernetes is v1.28**
- **Ansible v2.14+, Jinja 2.11+ and python-netaddr is installed on the machine that will run Ansible commands**
- The target servers must have **access to the Internet** in order to pull docker images. Otherwise, additional configuration is required (See [Offline Environment](docs/operations/offline-environment.md))
- The target servers are configured to allow **IPv4 forwarding**.
- If using IPv6 for pods and services, the target servers are configured to allow **IPv6 forwarding**.
- The **firewalls are not managed**, you'll need to implement your own rules the way you used to.
@ -225,41 +224,41 @@ These limits are safeguarded by Kubespray. Actual requirements for your workload
You can choose among ten network plugins. (default: `calico`, except Vagrant uses `flannel`)
- [Calico](https://docs.projectcalico.org/latest/introduction/) is a networking and network policy provider. Calico supports a flexible set of networking options
- [Calico](https://docs.tigera.io/calico/latest/about/) is a networking and network policy provider. Calico supports a flexible set of networking options
designed to give you the most efficient networking across a range of situations, including non-overlay
and overlay networks, with or without BGP. Calico uses the same engine to enforce network policy for hosts,
pods, and (if using Istio and Envoy) applications at the service mesh layer.
- [cilium](http://docs.cilium.io/en/latest/): layer 3/4 networking (as well as layer 7 to protect and secure application protocols), supports dynamic insertion of BPF bytecode into the Linux kernel to implement security services, networking and visibility logic.
- [weave](docs/weave.md): Weave is a lightweight container overlay network that doesn't require an external K/V database cluster.
- [weave](docs/CNI/weave.md): Weave is a lightweight container overlay network that doesn't require an external K/V database cluster.
(Please refer to `weave` [troubleshooting documentation](https://www.weave.works/docs/net/latest/troubleshooting/)).
- [kube-ovn](docs/kube-ovn.md): Kube-OVN integrates the OVN-based Network Virtualization with Kubernetes. It offers an advanced Container Network Fabric for Enterprises.
- [kube-ovn](docs/CNI/kube-ovn.md): Kube-OVN integrates the OVN-based Network Virtualization with Kubernetes. It offers an advanced Container Network Fabric for Enterprises.
- [kube-router](docs/kube-router.md): Kube-router is a L3 CNI for Kubernetes networking aiming to provide operational
- [kube-router](docs/CNI/kube-router.md): Kube-router is a L3 CNI for Kubernetes networking aiming to provide operational
simplicity and high performance: it uses IPVS to provide Kube Services Proxy (if setup to replace kube-proxy),
iptables for network policies, and BGP for ods L3 networking (with optionally BGP peering with out-of-cluster BGP peers).
It can also optionally advertise routes to Kubernetes cluster Pods CIDRs, ClusterIPs, ExternalIPs and LoadBalancerIPs.
- [macvlan](docs/macvlan.md): Macvlan is a Linux network driver. Pods have their own unique Mac and Ip address, connected directly the physical (layer 2) network.
- [macvlan](docs/CNI/macvlan.md): Macvlan is a Linux network driver. Pods have their own unique Mac and Ip address, connected directly the physical (layer 2) network.
- [multus](docs/multus.md): Multus is a meta CNI plugin that provides multiple network interface support to pods. For each interface Multus delegates CNI calls to secondary CNI plugins such as Calico, macvlan, etc.
- [multus](docs/CNI/multus.md): Multus is a meta CNI plugin that provides multiple network interface support to pods. For each interface Multus delegates CNI calls to secondary CNI plugins such as Calico, macvlan, etc.
- [custom_cni](roles/network-plugin/custom_cni/) : You can specify some manifests that will be applied to the clusters to bring you own CNI and use non-supported ones by Kubespray.
See `tests/files/custom_cni/README.md` and `tests/files/custom_cni/values.yaml`for an example with a CNI provided by a Helm Chart.
The network plugin to use is defined by the variable `kube_network_plugin`. There is also an
option to leverage built-in cloud provider networking instead.
See also [Network checker](docs/netcheck.md).
See also [Network checker](docs/advanced/netcheck.md).
## Ingress Plugins
- [nginx](https://kubernetes.github.io/ingress-nginx): the NGINX Ingress Controller.
- [metallb](docs/metallb.md): the MetalLB bare-metal service LoadBalancer provider.
- [metallb](docs/ingress/metallb.md): the MetalLB bare-metal service LoadBalancer provider.
## Community docs and resources
@ -280,4 +279,4 @@ See also [Network checker](docs/netcheck.md).
The Kubespray Project is released on an as-needed basis. The process is as follows:
1. An issue is proposing a new release with a changelog since the last release. Please see [a good sample issue](https://github.com/kubernetes-sigs/kubespray/issues/8325)
2. At least one of the [approvers](OWNERS_ALIASES) must approve this release
3. The `kube_version_min_required` variable is set to `n-1`
4. Remove hashes for [EOL versions](https://github.com/kubernetes/website/blob/main/content/en/releases/patch-releases.md) of kubernetes from `*_checksums` variables.
5. Create the release note with [Kubernetes Release Notes Generator](https://github.com/kubernetes/release/blob/master/cmd/release-notes/README.md). See the following `Release note creation` section for the details.
6. An approver creates [new release in GitHub](https://github.com/kubernetes-sigs/kubespray/releases/new) using a version and tag name like `vX.Y.Z` and attaching the release notes
7. An approver creates a release branch in the form `release-X.Y`
8.The corresponding version of [quay.io/kubespray/kubespray:vX.Y.Z](https://quay.io/repository/kubespray/kubespray) and [quay.io/kubespray/vagrant:vX.Y.Z](https://quay.io/repository/kubespray/vagrant) container images are built and tagged. See the following `Container image creation` section for the details.
9.The `KUBESPRAY_VERSION` variable is updated in `.gitlab-ci.yml`
10. The release issue is closed
11.An announcement email is sent to `dev@kubernetes.io` with the subject `[ANNOUNCE] Kubespray $VERSION is released`
12. The topic of the #kubespray channel is updated with `vX.Y.Z is released! | ...`
1. At least one of the [approvers](OWNERS_ALIASES) must approve this release
1. (Only for major releases) The `kube_version_min_required` variable is set to `n-1`
1. (Only for major releases) Remove hashes for [EOL versions](https://github.com/kubernetes/website/blob/main/content/en/releases/patch-releases.md) of kubernetes from `*_checksums` variables.
1. Create the release note with [Kubernetes Release Notes Generator](https://github.com/kubernetes/release/blob/master/cmd/release-notes/README.md). See the following `Release note creation` section for the details.
1. An approver creates [new release in GitHub](https://github.com/kubernetes-sigs/kubespray/releases/new) using a version and tag name like `vX.Y.Z` and attaching the release notes
1. (Only for major releases) An approver creates a release branch in the form `release-X.Y`
1.(For majorreleases) On the `master` branch: bump the version in `galaxy.yml` to the next expected major release (X.y.0 with y = Y + 1), make a Pull Request.
1.(For minor releases) On the `release-X.Y` branch: bump the version in `galaxy.yml` to the next expected minor release (X.Y.z with z = Z + 1), make a Pull Request.
1. The corresponding version of [quay.io/kubespray/kubespray:vX.Y.Z](https://quay.io/repository/kubespray/kubespray) and [quay.io/kubespray/vagrant:vX.Y.Z](https://quay.io/repository/kubespray/vagrant) container images are built and tagged. See the following `Container image creation` section for the details.
1.(Only for major releases) The `KUBESPRAY_VERSION` in `.gitlab-ci.yml` is upgraded to the version we just released # TODO clarify this, this variable is for testing upgrades.
1. The release issue is closed
1. An announcement email is sent to `dev@kubernetes.io` with the subject `[ANNOUNCE] Kubespray $VERSION is released`
1. The topic of the #kubespray channel is updated with `vX.Y.Z is released! | ...`
# Running systemd-machine-id-setup doesn't create a unique id for each node container on Debian,
# handle manually
- name:Re-create unique machine-id (as we may just get what comes in the docker image), needed by some CNIs for mac address seeding (notably weave)# noqa 301
- name:Re-create unique machine-id (as we may just get what comes in the docker image), needed by some CNIs for mac address seeding (notably weave)
- name:Ensure Gluster brick and mount directories exist.
file:"path={{ item }} state=directory mode=0775"
file:
path:"{{ item }}"
state:directory
mode:0775
with_items:
- "{{ gluster_brick_dir }}"
- "{{ gluster_mount_dir }}"
- name:Configure Gluster volume with replicas
gluster_volume:
gluster.gluster.gluster_volume:
state:present
name:"{{ gluster_brick_name }}"
brick:"{{ gluster_brick_dir }}"
replicas:"{{ groups['gfs-cluster'] | length }}"
cluster:"{% for item in groups['gfs-cluster'] -%}{{ hostvars[item]['ip']|default(hostvars[item].ansible_default_ipv4['address']) }}{% if not loop.last %},{% endif %}{%- endfor %}"
cluster:"{% for item in groups['gfs-cluster'] -%}{{ hostvars[item]['ip'] | default(hostvars[item].ansible_default_ipv4['address']) }}{% if not loop.last %},{% endif %}{%- endfor %}"
host:"{{ inventory_hostname }}"
force:yes
run_once:true
when:groups['gfs-cluster']|length > 1
when:groups['gfs-cluster'] | length > 1
- name:Configure Gluster volume without replicas
gluster_volume:
gluster.gluster.gluster_volume:
state:present
name:"{{ gluster_brick_name }}"
brick:"{{ gluster_brick_dir }}"
cluster:"{% for item in groups['gfs-cluster'] -%}{{ hostvars[item]['ip']|default(hostvars[item].ansible_default_ipv4['address']) }}{% if not loop.last %},{% endif %}{%- endfor %}"
cluster:"{% for item in groups['gfs-cluster'] -%}{{ hostvars[item]['ip'] | default(hostvars[item].ansible_default_ipv4['address']) }}{% if not loop.last %},{% endif %}{%- endfor %}"
If you want to use another distribution than Ubuntu 18.04 (Bionic) LTS, you can modify the search filters of the 'data "aws_ami" "distro"' in variables.tf.
## Using other distrib than Ubuntu***
For example, to use:
To leverage a Linux distribution other than Ubuntu 18.04 (Bionic) LTS for your Terraform configurations, you can adjust the AMI search filters within the 'data "aws_ami" "distro"' block by utilizing variables in your `terraform.tfvars` file. This approach ensures a flexible configuration that adapts to various Linux distributions without directly modifying the core Terraform files.
- Debian Jessie, replace 'data "aws_ami" "distro"' in variables.tf with
### Example Usages
```ini
data "aws_ami" "distro" {
most_recent=true
- **Debian Jessie**: To configure the usage of Debian Jessie, insert the subsequent lines into your `terraform.tfvars`:
filter {
name="name"
values = ["debian-jessie-amd64-hvm-*"]
}
```hcl
ami_name_pattern = "debian-jessie-amd64-hvm-*"
ami_owners = ["379101102735"]
```
filter {
name="virtualization-type"
values = ["hvm"]
}
- **Ubuntu 16.04**: To utilize Ubuntu 16.04 instead, apply the following configuration in your `terraform.tfvars`:
sed -i 's/^#*Port [0-9]*/Port ${ssh_port}/' /etc/ssh/sshd_config
}
configure_ssh_port
#################################################
##
## Hostname
##
hostnamectl set-hostname ${hostname}
#################################################
##
## Disable swap files genereated by systemd-gpt-auto-generator
##
systemctl mask "dev-sda3.swap"
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.