Only checking the kubernetes api on the first master when upgrading is not enough.
Each master needs to be checked before it's upgrade.
Signed-off-by: Rick Haan <rickhaan94@gmail.com>
* cherry-pick bump crio version to 1.19 (#6758)
cherry-pick modifications:
* keep default to 1.17 as release 2.14 came with
* don't change readme with newer versions
* bump crio version to 1.19
* crio package name has changed for debian/ubuntu
* crio upgrade does not work, see #6757
* update crio info in docs
* Install cri-o with package version (#6853)
and thereby support upgrade from e.g. 1.18.x to 1.19.y
Included OSes:
- Centos7/8
- Ubuntu18/20
New variables for overriding by default installed packages:
- centos_crio_packages
- ubuntu_crio_packages
* add support crio version for varios k8s vers (#7003)
* add support crio version for various k8s vers
* regexp in pkg versions
Co-authored-by: Hans Feldt <2808287+hafe@users.noreply.github.com>
Co-authored-by: Sergey <s.bondarev@southbridge.ru>
This new version uses the same base image as kube-proxy
(k8s.gcr.io/build-image/debian-iptables)
This allow to automatically pick iptables-legacy or iptables-nft,
and be compatible with RHEL/CentOS 8
https://github.com/kubernetes/dns/pull/367
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
(cherry picked from commit e909f84966)
RedHat 8.3 merged nf_conntrack_ipv4 in nf_conntrack but still advertise 4.18
so just try to modprobe and decide depending on the success
Also nf_conntrack is a dependency of ip_vs, so no need to care about it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
(cherry picked from commit 00e0f3bd2b)
* fix flake8 errors in Kubespray CI - tox-inventory-builder
* fix flake8 errors in Kubespray CI - tox-inventory-builder
* Invalidate CRI-O kubic repo's cache
Signed-off-by: Victor Morales <v.morales@samsung.com>
* add support to configure pkg install retries
and use in CI job tf-ovh_ubuntu18-calico (due to it failing often)
* Switch Calico and Cilium image repos to Quay.io
Co-authored-by: Victor Morales <v.morales@samsung.com>
Co-authored-by: Barry Melbourne <9964974+bmelbourne@users.noreply.github.com>
Conflicts:
roles/download/defaults/main.yml
* up vagrant box to fedora/33-cloud-base in cri-o molecule tests
(cherry picked from commit 06ec5393d7)
* add Google proxy-mirror-cache for docker hub to CI tests
(cherry picked from commit d739a6bb2f)
* containerd docker hub registry mirror support
* containerd docker hub registry mirror support
* add docs
* fix typo
* fix yamllint
* fix indent in sample
and ansible-playbook param in testcases_run
* fix md
* mv common vars to tests/common/_docker_hub_registry_mirror.yml
* checkout vars to upgrade tests
(cherry picked from commit 4a8a52bad9)
* Exclude .git/ from shellcheck
If a branch name contains '.sh', current shellcheck checks the branch
file under .git/ and outputs error because the format is not shell
script one.
This makes shellcheck exclude files under .git/ to avoid this issue.
(cherry picked from commit e2467d87b6)
Co-authored-by: Hans Feldt <2808287+hafe@users.noreply.github.com>
Co-authored-by: Sergey <s.bondarev@southbridge.ru>
Co-authored-by: Kenichi Omichi <ken-oomichi@wx.jp.nec.com>
* remove podman cni plugin
* configure networkamanger global dns
* allow installation of python3-libselinux by disabling update repo temporary
* remove ipv4 section because it is not a valid configuration
Removes these startup warnings:
Warning: For remote container runtime, --pod-infra-container-image is ignored in kubelet, which should be set in that remote runtime instead
Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
* add snapshot-controller and v1beta1 snapshot api
* fix typo
* udpate manifest to v1beta1
* update
* update manifests
* fix spelling
* wait until crd is applied
* fix missing info in kube module
* revert snapshotclass
* add snapshot crds before applying the csi driver
* add crds, missed them in last commit
* use pull policy from kubespray
* Update CustomResourceDefinition for kubecontrollersconfigurations.crd.projectcalico.org to v1
* Align ClusterRole for kube-controllers with upstream (calico)
* Add support for openstack application credentials
* Add some lines for readability
* Update external_openstack_tenant_id check
Do not check external_openstack_tenant_id when application credentials are defined
* Add check for external_openstack_domain_id
* Fix typo
By default do not allow "unqualified" (without a registry) images
because it is considered unsecure and subject to mitm attacks.
To enable insecure pull configure for example:
crio_registries:
- "docker.io"
- "quay.io"
* Use proper openssl command to differentiate between host and ip in current certificate check
* fixup! Use proper openssl command to differentiate between host and ip in current certificate check
The bootstrap-os role uses a bootstrap script to provision a
python interpreter on flatcar and container os hosts. As the
pypy project switched to another hoster, the download url changed.
If applied this will use the new proper pypy download url in bootstrap script
* Update the cilium svc proxy test to HA mode
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix cilium strict kube-proxy in HA
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add a single global endpoint variable
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add cilium docs about kube-proxy replacement
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix issues in docs
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
Nit alert. Sample inventory throws an error when processed
by yamllint. The default line is currently commented out.
However, when uncommenting it our linters fail.
* Option for MetalLB to talk BGP
* Check for BGP peers when metallb_protocol is bgp
* README clarification
* Commented values as documentation only in the sample inventory
* layer 2 or BGP, not both
* log level by default increased to 'info'
* cgroup manager by default set to 'systemd'
* stream port (used by kubelet) bound to 127.0.0.1 for security reasons
* metrics can be enabled and port specified
To make it less confusing for users who uncommented whole block of
local path provisioner [1] the samples should point at least to
version 0.0.3 which supports helper image [2] configured by
local_path_provisioner_helper_image_repo variable. As 0.0.3 is a bit old
samples could point to current newest release 0.0.14.
[1] 45a177e2a0 (commitcomment-38625688)
[2] 315d67fa8c
Trying to layer this package on Fedora 32 causes the install to crash
and furthermore it looks like the original bug linked to in the comment
has been resolved for Fedora 31
* Update calico_veth_mtu to FELIX_IPINIP variable
calico_veth_mtu is specified in the configuration, but since it only works for wireguard, modify it to work for IP-in-IP users.
* Update template with more cleaner expression
* Add additional metadata configuration option to external Openstack CCM (kubernetes-sigs#6338)
* Set the variable external_openstack_metadata_search_order undefined by default
* Fix kubelet cgroup driver detection for crio
Remove fact standalone_kubelet since it is not used
* Fix yamllint complaints of roles/kubernetes/node/tasks/facts.yml
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
This changes MetalLB contrib to one of addons for deploying MetalLB with
Kubernetes cluster deployment. By the default, Kubespray doesn't deploy
MetalLB addon.
inventory_builder creates hosts.yaml file with hostnames like "node1",
"node2", etc. Even if specifying override_system_hostname=false, the
output of "kubectl get nodes" shows those hostnames ("node1", etc.)
without using actual hostnames.
To solve this issue, this adds an option USE_REAL_HOSTNAME to get
actual hostnames when creating hosts.yaml file instead of "node1", etc.
Support for Ambassador OSS as an Ingress Controller when
settings `ingress_ambassador_enabled: true`.
Signed-off-by: Alvaro Saurin <alvaro.saurin@gmail.com>
Now the in-tree cloud provider is deprecated and it is recommended to
the external cloud provider for OpenStack instead.
The doc described how to upgrade from the in-tree cloud provider, but
it is better to describe how to deploy the external cloud provider from
scratch instead for current situation.
This updates the OpenStack doc for this usecase.
* Install Kata Containers as additional container runtime
* Create RuntimeClasses for Kata Containers
* Updated Vagrant to optionally run without Docker as container manager
* Updated Vagrant to optionally use Libvirt nested virtualization
* Add Kata Containers documentation
* Fix lint errors
* Add kata_containers_enabled to kubespray-defaults
* Fixed typo error
* Fixed typo error
We need to specify either external_openstack_tenant_name or
external_openstack_tenant_id. Those values were checked by seeing they
are defined or they have actual values separately.
However those values are always defined because of the following code
of openstack/defaults/main.yml:
external_openstack_tenant_id: "{{ lookup('env','OS_TENANT_ID')| default(lookup('env','OS_PROJECT_ID'),true) }}"
external_openstack_tenant_name: "{{ lookup('env','OS_TENANT_NAME')| default(lookup('env','OS_PROJECT_NAME'),true) }}"
So even if not specifying both values, those checks could not detect
the misconfiguration. This fixes this to detect the misconfiguration.
* MINOR: Check kernel version before enable modprobe nf_conntrack
* CLEANUP: no more need to ignore error of this task
* MINOR: Fixing yaml and ansible lint error - remove trailling-space
If the special parameter "$@" is not quoted, the following command will not work:
./kubectl.sh patch storageclass my-storage-class -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
* Add oraclelinux8 and disable firewalld
Add oraclelinux8 image and disable firewalld on oraclelinux VMs
* Fix Oracle Linux repositories
As documented in: http://yum.oracle.com/getting-started.html#installing-software-from-oracle-linux-yum-server
public-yum-ol7.repo was deprecated on release 7.6. Some repos were integrated into oracle-linux-ol7.repo (i.e.: ol7_latest, ol7_addons) and other are available as packages (epel). This also adds support for oraclelinux8
* Fix to use ansible_distribution_version
Instead of ansible_distribution_major_version
* Update README.md
On OpenStack history, we used to call "tenant" for separeted namespace.
However we use "project" now instead.
Then we have replaced "tenant" with "project". Then all "TENANT" variables
also are renamed to "PROJECT".
This makes Kubespray search "PROJECT" variable also for newer OpenStack
clouds.
with the Python ruamel.yml library
- Change True/False to true/false in a few places so file can
be more easily round-tripped with the Python ruamel.yml library
flannel, ovn and multus network plugins did not support all taint keys. This
update changes the tolerations to support them all.
According to the documentation:
```
There are two special cases: An empty key with operator Exists matches all keys,
values and effects which means this will tolerate everything. An empty effect matches
all effects with key key.
```
Usage of the empty `key` and `effect` ensures the network plugin daemonset will
be deployed on every nodes (ex: in case of custom taints, or NoExecute effect)
Since MetalLB v0.8[1], metallb:speaker has started publishing an event
nodeAssigned on k8s resource.
To support MetalLB v0.8+, this allows metallb:speaker to create events.
[1]: 5cc6e23776 (diff-60053ad6fecb5a3cfabb6f3d9e720899R246)
Since weave 2.5.1, `NoExecute` taint effect is no more supported,
this changes the daemonset tolerations to change this behavior.
Also remove the toleration key `CriticalAddonsOnly` not required anymore.
If running MetalLB v0.7.3 on k8s v1.18.2, metallb pods output the
following parsing error of v1.ServiceList:
$ kubectl logs controller-dbb46cf84-fw8h8 -n metallb-system
{
"caller":"reflector.go:205",
"level":"error",
"msg":"go.universe.tf/metallb/internal/k8s/k8s.go:231:
Failed to list *v1.Service: v1.ServiceList:
Items: []v1.Service: v1.Service: ObjectMeta:
v1.ObjectMeta: readObjectFieldAsBytes:
expect : after object field, parsing 1605
Then an external IP address is never allocated to the Service of
LoadBalancer type.
By updating MetalLB version to the latest v0.9[1] today, this issue
can be solved.
[1]: https://hub.docker.com/r/metallb/controller/tags
* fix(kubelet): exec notify restart kubelet service when kube-config.yml changed
* Revert "refactor(kubelet handler): change task name("reload kubelet") this is misleading"
This reverts commit 8f5d29560802c7c997293adb1ce9f84d3b20b6cb.
* fix(handlers,kubelet): setting right notify task name
* update documentation to add and remove nodes
* add information about parameters to change when adding multiple etcd nodes
* add information about reset_nodes
* add documentation about adding existing nodes to ectd masters.
* Add additional network configuration options to external Openstack CCM (#6083)
* Change the default version of external openstack cloud controller image to v1.18.1 since there was an issue in v1.18.0 where some IPs of the private network were ignored
* Change Network section in external-openstack-cloud-config.j2 to Networking
* Add networking customization information in the openstack documentation
This updates MetalLB README as following
- Remove unnecessary markdown to read it easily on github
- Make words consistency (kubernetes, loadbalancer)
- Add change-required option
Due to lack of requirements installation on Azure README, the error
can happen:
"The ipaddr filter requires python's netaddr be installed on the
ansible controller"
It is nice to add the installation for Azure users.
The 98e7a07fba commit udpates the
dashboard version to 2.0.0 but it enable skip login flag wasn't
updated. This change updates its identation to avoid issues when
dashboard_skip_login is enabled.
apply-rg.sh was for Azure command version 1("azure" command) and the
command is old and version 2("az" command) is officially used today.
apply-rg_2.sh was for the version 2. In addition, the README[1] says
we need to run apply-rg.sh for applying templates.
This renames apply-rg_2.sh to apply-rg.sh for common usages of the
version 2.
[1]: https://github.com/kubernetes-sigs/kubespray/tree/master/contrib/azurerm#generating-and-applying
The ansible-playbook needs to ssh-login to Azure virtual machines with
ssh keypair, and users need to specify ssh_public_keys for their own
ssh public key. The change of ssh_public_keys is mandatory.
So this updates contrib/azurerm/README.md to explain that.
In addition, the path of all.yml was wrong. That also is updated with
this.
apply-rg_2.sh uses 'az group deployment' command but the command is
deprecated like the following warning message:
"This command is implicitly deprecated because command group
'group deployment' is deprecated and will be removed in a future release.
Use 'deployment group' instead."
This updates these deprecated commands.
FYI: The command has been deprecated since [1] on azure-cli side.
[1]: 991cb7cc7c (diff-2057bbb8441166e4910b34b09d22b58cR222)
* bump to dashboard 2.0 rc6 with metrics scrapper
* fix missing yaml seperator making Replicaset complaining about missing ServiceAccount
* unwanted legay gross hack forgot to remove before
* no need namespace on CrBinding
* bump to 2.0.0 release
* remove dashboard_metrics_scrapper_enabled
* add strategy mitogen_linear when installed mitogen
* add small docs
Rename playbook file
The raw action executes as a regular Mitogen connection, which requires Python on the target, so add strategy: linear to bootstrap-os role playbook.
* add mitogen to CI test
fix typo
* enable mitogen test on deploy-part1 tests
change version from master to release
download tar.gz archive
* run all CI tests with mitogen
* disable mitogen with upgrade CI tests
* enable mitogen on CI tests via env vars
* disable mitogen on CI test by default, enable on some different OS
* disable mitogen CI test on centos8
(get error /usr/bin/python: No such file or directory)
* replace removed repo with kubic repository for centos 7
* add crio configuration for centos8
* add crio configurations for debian
* use correct crio version for fedora
* simplify calulation of required crio version
- gives possibility to overwrite
* change default path for runc
* change default for seccomp path
* change default for conmon
* declare kubic repo for ubuntu
* do not install crictl twice
* move fedora repo modular tasks to crio_repo file
* move centos repo tasks to crio_repo
* declare crio version matrix for ubuntu
* update documentation crio support for ubuntu
cloud_provider option exists in ./inventory/sample/group_vars/all/all.yml
In addition, the quick start shows to create configuration by copying
./inventory/sample. So this updates path of all.yml for fitting the above.
* Add proxy support to CRI-O service
The crio.service requires proxy environment variables when it's
deployed behind a corporated network. This change creates a systemd
configuration file when the proxy variables are defined.
* Remove unnecesary crio's tasks
The playbook that bootstrap openSUSE servers assumes that the
/etc/sysconfig/proxy file exists but the execution fails when
these file is not present. This change guarantees its existence.
The Terraform installation part states that is for CentOS 7, but the echo command refers to OS X binary. Updated the echo command to use the Linux version.
* assembly fallback_ips and no_proxy var only one time on localhost and populate result on all hosts
* add tag always, fix ansible lint errors
* workaround to mitogen issue dw/mitogen#663
* do not gather fact before install python on coreos like distros
* try to pass docker molecule test
* fix upgrade of crio on fcos
- update documents
* install conntrack required by kube-proxy
- like commit 48c41bcbe7
* enable fedora modular repo for crio
* allow to override crio configuration
- set cgroup manager same to kubelet_cgroup_driver if defined
- path of seccomp_profile depends on distribution
* allow to override crio configuration
- fix path for ubuntu
* allow to override crio configuration
- fix cni path for fcos
* added required permissions for querying endpointslice resources
* copy-pasted role permissions from cilium install manifests
* bumped cilium version to v1.7.2
* Fix proxy and module_hotfixes
On CentOS 8 with proxy ansible render inline `proxy` and `module_hotfixes` options.
For example:
`proxy=http://127.0.0.1:3128module_hotfixes=True`
But expected result:
```
proxy=http://127.0.0.1:3128
module_hotfixes=True
```
* Use ini_file module for work with ini files
* Prevent duplicates proxy= option in /etc/yum.conf
Module `lineinfile` is weak, use most powerful module `ini_file` and add or remove `proxy=` when `http_proxy` is defined or not.
* etcd: etcd-events doesn't depend on etcd_cluster_setup
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: remove condition already present on include_tasks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: fix scaling up
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use *access_addresses, do not delegate to etcd[0]
We want to wait for the full cluster to be healthy,
so use all the cluster addresses
Also we should be able to run the playbook when etcd[0] is down
(not tested), so do not delegate to etcd[0]
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use failed_when for health check
unhealthy cluster is expected on first run, so use failed_when
instead of ignore_errors to remove scary red messages
Also use run_once
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/preinstall: ensure ansible_fqdn is up to date after changing /etc/hosts
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/master: regenerate apiserver cert if needed
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Fix chicken and egg problem with proxy_env not defined on the first envinronment usage.
* Disable fact gathering for the first proxy_env evaluation.
* Move proxy_env var set up from the role defaults to the root playbooks as fact.
* requirements.txt: Bump versions
Ansible 2.8+ allow ansible_python_interpreter autodetection
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* tests: do not force ansible_python_interpreter
we do not expect people to set ansible_python_interpreter, so we should not set it in the CI
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add CentOS 8 Calico to CI
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* requirements.txt: Bump versions
Ansible 2.8+ allow ansible_python_interpreter autodetection
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* tests: do not force ansible_python_interpreter
we do not expect people to set ansible_python_interpreter, so we should not set it in the CI
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes-sigs-kubespray #5824
Added support nodes which are part of Virtual Machine Scale Sets(VMSS)
* kubernetes-sigs-kubespray #5824
* kubernetes-sigs-kubespray #5824
Added comments and updatetd azure docs.
* kubernetes-sigs-kubespray #5824
Added supported values comments for "azure_vmtype" in azure.yml
Before this commit, the bastion entry in the inventary was not honored,
so machines behind firewalls or with unrouted addresses were not
reachable for ansible.
The variable is defined in `kubernetes/preinstall` role and used in several roles. Since `kubernetes/preinstall` is not always included when `ansible-playbook` is run with tag selectors (see #5734 for reason), they will fail, or individual roles must copy the same fact definitions (as in #3846). Moving the definition to the always-included `kubespray-defaults` role will resolve the dependency problem.
- This solves issue #5721 & #5713 (dupes)
- Provide a cleaner default usage pattern for the download role
around etcd that supports 'host' and 'docker' properly
- Extract the 'etcdctl' as a separate task install piece and reuse it where
appropriate
- Update the kubeadm-etcd task to reflect the above change
* fedora coreos support
- bootstrap and new fact for
* fedora coreos support
- fix bootstrap condition
* fedora coreos support
- allow customize packages for fedora coreos bootstrap
* fedora coreos support
- prevent install ptyhon3 and epel via dnf for fedora coreos
* fedora coreos support
- handle all ostree like os in same way
* fedora coreos support
- handle all ostree like os in same way for crio
* fedora coreos support
- add fcos documentations
* Support configuring the insert mode
Defaults to the upstream default https://docs.projectcalico.org/v3.9/reference/felix/configuration
so nothing should change for existing deployments.
This allows coexistence with other firewall management technologies.
* Add a note to the sample config
* Add docker-ce 19.03 packages for Debian & Ubuntu
K8s has updated the recommended Docker version to 19.03. More
specifically it should be 19.03.4, but since we used 18.06.7 instead of
.2, I'm assuming the latest patch version should be used here as well.
* Add docker 19.03 for redhat
The 'regexp' parameter matches last occurrence of a line starting with 'proxy=' and replaces it with the one defined in 'line' parameter. If no match - it works same way as before. This fixes resuming cluster deployments failed after that task (if there was no more than one line starting with 'proxy' in the yum.conf file - this condition should also be reassured with the change introduced here) eg. if they were initiated with Terraform.
refs #5277
As the issue describes, when no external or local load-balanced is used,
kube-proxy won't be able to contact apiserver at 127.0.0.1. So the
config map should be left as is.
* Upgrade etcd to 3.3.18
* Try with etcd 3.3.15 (kubeadm 1.16.7 default)
* Back to square one
* Try with 3.3.11
* Upgrade etcd to 3.3.18 (take 2)
* Try with 3.3.12
* add documentation for how to upgrade to the new external cloud provider
* add migrate_openstack_provider playbook
* fix codeblock syntax highligth
* make docs for migrating cloud provider better
* update grammar
* fix typo
* Make sure the code is correct markdown
* remove Fenced code blocks
* fix markdown syntax
* remove extra lines and fix trailing spaces
* download file
* download containers
* fix push image to nodes
* pull if none image on host
* fix
* improve docker image tag checks.
do not pull already cached images
* rebase fix merge conflict
* add support download_run_once when upgrade and scale cluster
add some test with download_run_once
* set default values to temp flag for every download cycle
* add save,load abilty for containerd and crio when download_run_once=true
* return redefine image save/load command to set_docker_image_facts.yml
* move set command to set_container_facts
* ctr in containerd_bin_dir
* fix order of ctr image export arguments
* temporary disable download_run_once for containerd and crio
due https://github.com/containerd/containerd/issues/4075
* remove unused files
* fix strict yaml linter warning and errors
* refactor logical conditions to pull and cache container images
* remove comment due lint check
* document role
* remove image_load_on_localhost, because cached images are always loaded to docker on remote sites
* remove XXX from debug output
* Run 'container-engine' after drain.
Move possibly disruptive role 'container-engine' to run after the node
is drained.
As that role have to be run on non-cluster nodes as well (etcd and
calico-rr), and those nodes are not drained, add play for that case.
* Check if api is up before upgrade.
If container engine is restarted in previous role, api controller can
take some time to start. This check ensures api is up before upgrade.
When kube-router is used as cni, rules might be added to the mangle table
to support external IPs. Therefore, mangle table should be flushed during
reset as well.
This 38688a4486 change replaces the
value for dockerproject_.+_repo_.+ docker variables but their new
value was previously defined in other variables. This change removes
the dockerproject_.+_repo_.+ docker variables in favor of the older
ones.
* Fix incorrect assertion comparison for kube_network_node_prefix
* Ignore assertion comparison for kube_network_node_prefix when using calico
* Adding more var docs description for kube_network_node_prefix
* Fixing trailing whitespaces
* External OpenStack Cloud Controller Manager implementation
* Adding controller image tag
* Minor fixes
* Restructuring the external cloud controller to work with KubeADM
* Added in code to allow control over pull policy for local path provisioner
* change to imagePullPolicy to use globally used variable k8s_image_pull_policy
* removed unusued variable from defaults
* updated contiv-etcd and cinder-csi-controllerplugin to use k8s_image_pull_policy variable
* Introduce kubelet_config_extra_args and kubelet_node_config_extra_args to pass params to kubelet via YAML config
* kubelet_config_extra_args is not the alternative
* Fix recover-control-plane to work with etcd 3.3.x and add CI
* Set default values for testcase
* Add actual test jobs
* Attempt to satisty gitlab ci linter
* Fix ansible targets
* Set etcd_member_name as stated in the docs...
* Recovering from 0 masters is not supported yet
* Add other master to broken_kube-master group as well
* Increase number of retries to see if etcd needs more time to heal
* Make number of retries for ETCD loops configurable, increase it for recovery CI and document it
* containerd: add proxy support
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubespray-defaults: add kube_service_addresses / kube_pods_subnet to no_proxy
CIDR notation in no_proxy is supported by a lot of programs/languages,
including go: https://github.com/golang/go/issues/16704
Without that containerd cannot talk the the API server (kube_apiserver_ip),
but it should not go through an external proxy for the nodes/pods/services
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Change dockerproject.org to download.docker.com
dockerproject.org was deprecated in 2017 and has gone down.
* Restore yum repo for containerd
Change-Id: I883bb512a2164a85865b1bd4fb569af0358c8c2b
Co-authored-by: Craig Rodrigues <rodrigc@crodrigues.org>
When running with serial != 100%, like upgrade_cluster.yml, we need to apply this fixup each time
Problem was introduced in 05dc2b3a09
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Raises limit from 100 to 300 because the default is far too low
and the pod can handle 300 with the given resources.
Change-Id: Ib1eec10da3d09d198933fcfe87291587e58d7cdb
I've tested this update by deploying a containerd / etcd cluster on top CentOS7,
MetalLB + NGINX Ingress. Upgrade using upgrade-cluster.yml
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Resolves issue where kubectl cache of <v1.16 api schema
interferes with interacting with daemonsets and deployments.
Change-Id: I63b7046958f2008eb144b6da0004c598f945e0ae
* Fix crictl
* Reload systemd daemon before enabling service
* Typo
* Add crictl template
* Remove seccomp.json for ubuntu
* Set runtime path of runc for ubuntu
* Change path to conmon
There is no cri-tools package in CentOS/EPEL/Red Hat.
Additionally, cri-tools is provided into the installation via
roles/download/defaults/main.yml:104:crictl_download_url.
* Fix python3-libselinux installation for RHEL/CentOS 8
In bootstrap-centos.yml we haven't gathered the facts,
so #5127 couldn't work
Minimum ansible version to run kubespray is 2.7.8,
so ansible_distribution_major_version is defined an there is no need to default it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Restart NetworkManager for RHEL/CentOS 8
network.service doesn't exist anymore
# systemctl status network
Unit network.service could not be found.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add module_hotfixes=True to docker / containerd yum repo config
https://bugzilla.redhat.com/show_bug.cgi?id=1734081https://bugzilla.redhat.com/show_bug.cgi?id=1756473
Without this setting you end up with the following error:
# yum install docker-ce
Failed to set locale, defaulting to C
Last metadata expiration check: 0:03:21 ago on Thu Sep 26 22:00:05 2019.
Error:
Problem: package docker-ce-3:19.03.2-3.el7.x86_64 requires containerd.io >= 1.2.2-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package containerd.io-1.2.2-3.3.el7.x86_64 is excluded
- package containerd.io-1.2.2-3.el7.x86_64 is excluded
- package containerd.io-1.2.4-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.5-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.6-3.3.el7.x86_64 is excluded
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* add support for nova servergroups
* Add documentation for openstack nova servergroups
* uppdate to TF 0.12.12 format and fix etcd
* revert for_each change
* fix variables and formatting in main.tf
* try to avoid errors
* update variable
* Update main.tf
* Update main.tf
* update all other instance resources
Initially this was to fix a mis-indented approvers key. However, it turns
out that 'oilbeater' is not a member of kubernetes-sigs nor
kubernetes-incubator (the org this repo was migrated from). Thus this
OWNERS file is failing prow's validation check.
As a workaround I've opted to move them to emeritus_approver, which
isn't valiated and can be used as a hint for other approvers in this
repo
This fixes the scenario where masters are upgraded one at a time
and coredns gets improperly scaled back up to 2 replicas.
Change-Id: I7cc9283f40efcfd61b5813c89a5805c95d901567
* Update parsing of terraform state file for 0.12.12
* Resource does not seem to have a module element but instead has
provider
* Return the boolean right way if it is already a bool since a bool does
not have an lower method
* Remove the setting of ansible_ssh_user to root for all Packet
Not all servers in packet are accessed as root by default. CoreOS
systems use the `core` user. Removing this allows the user to specify
the remote user with an extra_var or in an ansible.cfg file.
* Default to root user for packet devices except on CoreOS
* Update TF_VERSION for packet in tf-validate-packet
Update TV_VERSION to 0.12.12 for gitlab-ci tf-validate-packet tests
* convert packet terraform files to TV_VERSION 4
* initalize terraform before copying the variable file to the top level dir
Kubespray Pull Request #5084 (https://github.com/kubernetes-sigs/kubespray/pull/5084) caused more problems than it solved due to limitations with the synchronize module. See comments on Kubespray Issues #5059 (https://github.com/kubernetes-sigs/kubespray/issues/5059) and #5116 (https://github.com/kubernetes-sigs/kubespray/issues/5116). Details from Ansible documentation: "Currently, synchronize is limited to elevating permissions via passwordless sudo. This is because rsync itself is connecting to the remote machine and rsync doesn’t give us a way to pass sudo credentials in. ... Currently there are only a few connection types which support synchronize (ssh, paramiko, local, and docker) because a sync strategy has been determined for those connection types. Note that the connection for these must not need a password as rsync itself is making the connection and rsync does not provide us a way to pass a password to the connection. ..." Thus, reverting Pull Request #5084.
* Add support for Kubernetes 1.16.1
* Defaults to 1.16.1
* add 1.16.2 checksums and set new version as default
* correct 1.16.2 checksums and add 1.15.5 checksums
Since it is unsupported to skip upgrades, I've detailed the steps for upgrading a step at a time and removed some language that indicated it should work
When using cluster.yml or scale.yml to add/scale nodes in the existing
k8s cluster, the `kubeadm init` wouldn't run. As a result, kube-proxy
wouldn't be created, and therefore the kube-proxy deletion task would
fail, e.g. in the case where kube-router is used and "kube_proxy_remove"
is set to true. As a workaround, add ignore_errors to the kube-proxy
deletion task.
It's unnecessary and breaks when running from within a docker container:
```
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TimeoutError: Timer expired after 10 seconds
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "/usr/sbin/udevadm info --query property --name /dev/mapper/vg00-root", "msg": "Timer expired after 10 seconds", "rc": 257}
```
The script is not usable unless you are in the '.vagrant/provisioners/ansible/inventory/artifacts' folder.
This update makes this usable from anywhere.
- do not run etcd role when etcd_kubeadm_enabled == true
- remove default value 'systemd' for cgroup driver in containerd role.
this value override autodetect in kubelet_cgroup_driver_detected from docker info
This allows to easily override the gcr, quay, and docker repos with the
mirror repos in countries like China, where the default accesses are
blocked or unstable.
mydict.keys() should be converted to list,
otherwise it causes errors in loop iteration.
Remove extra space after class name, which broke configmap.
Also allow set reclaimPolicy property.
Cleaned up deprecated APIs:
apps/v1beta1
apps/v1beta2
extensions/v1beta1 for ds,deploy,rs
Add workaround for deploying helm using incompatible
deployment manifest.
Change-Id: I78b36741348f47a999df3841ee63cf4e6f377830
* Use python3-libselinux on RHEL8/Centos8
* The fact ansible_facts.distribution_major_version is not present on older Ansible version.
Default it to 0 in when not present and use libselinux-python as package to get current
default behaviour.
Fix for Kubespray Issue #5059 (https://github.com/kubernetes-sigs/kubespray/issues/5059). There is a known issue with the 'fetch' module that will sometimes lead to it failing with a memory error. See ansible/ansible#11702 (https://github.com/ansible/ansible/issues/11702). I encountered this issue with the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml, and it caused my entire deployment to error out (see "Output of ansible run" above). Replacing 'fetch' with 'synchronize' fixes this issue.
Updated Openstack to terraform 0.12 (#5062)
* update openstack to terraform 0.12(.5)
* replace cluter.tf with cluster.tfvars
* update README.md to terraform 0.12
* update Openstack CI tests to use terraform 0.12
* specify terraform version in openstack README
* gitlab CI to copy cluster.tfvars in case of openstack provider
* The terraform/openstack dynamic inventory can read
tfstate v4 (generated by terraform 0.12) and convert them internally
ro v3 (as generated by terraform 0.11.x).
Additionally the script has been updated to Python 3.
* run 'task download_container | Copy image to ansible host cache' with synchronize on download_delegate host
* try to run task copy file to ansible host on all inventory, not only on first random host
* update openstack to terraform 0.12(.5)
* replace cluter.tf with cluster.tfvars
* update README.md to terraform 0.12
* update Openstack CI tests to use terraform 0.12
* specify terraform version in openstack README
* gitlab CI to copy cluster.tfvars in case of openstack provider
* The terraform/openstack dynamic inventory can read
tfstate v4 (generated by terraform 0.12) and convert them internally
ro v3 (as generated by terraform 0.11.x).
Additionally the script has been updated to Python 3.
Fixes situation when using manual mode because it
tries to download coredns v1.3.1 from the same
image repository where kubernetes images are
downloaded from.
Change-Id: Ibbec8a72c8162ce8befa74e2013a268737ea5f8a
* Refactor calico-rr to run in k8s cluster with taint
Change-Id: I75a3169ff5b36ce8302fc7ef1c32d3eb697b5afa
* add preinstall checks
* rework calico/rr role
Change-Id: I2f0a7e6cb77cf91ad4a615923680760d2e5d9ca8
* add empty calico-rr group
Change-Id: I006c0a60db9b72d02245bf8fdfabcf982144a5ad
* lvm packages removal during tear down skipped by default
* lvm utils execution PATH fixed for CentOS/RH
* Heketi updated to the latest version 9
Signed-off-by: Vitaliy Dmitriev <vi7alya@gmail.com>
* Let Premoderator script add labels
* Fix JQ error
* Minor fixes
* Debug patch label output
* Try again
* Try again
* Try again
* Try again
* Try again
* Minor cleanup
* Enable nodes to run calicoctl
per-node tasks require waiting for calico-node to be applied
Change-Id: Ibe1076b7334a2da0332f2dd766fde0c3f172d1f2
* cleanup tasks that should run on master
Change-Id: I43a837879ef41596f14657ecd7f813899b6865ae
* Switch run_once calico logic to just run on first master
Change-Id: I6893711e354f63c5e1eaf6ac2e23d9a6347a555d
Update README.md to link to the open issue that shows Ansible 2.8.x doesn't work with Kubespray. The requirements.txt file is already fixed to 2.7.8 so only the README needed updating, I think.
* Enable containerd to deploy vanilla containerd package
Fixes kubeadm references to CRI socket for containerd
Fixes download role cache feature to work with containerd
Change-Id: I2ab8f0031107e2f0d1a85c39b4beb66f08509a01
* use containerd for flannel-addons job
Change-Id: Ied375c7d65e64a625ffbd995ff16f2374067dee6
* add containerd vars
Change-Id: Ib9a8a04e501c481a86235413cbec63f3672baf91
* fixup vars
Change-Id: Ibea64e4b18405a578b52a13da100384582aa24c2
* more fixes
* fix rh repo
Change-Id: I00575a77cfb7b81d6095db5d918a52023c8f13ba
* Adjust helm host install for containerd
* Add calico 3.7.3 support
* add calico_datastore variable to policy controller role
* add missing clusterrole rules for calico policy controller
* disable calico kube controller when kdd mode is used for versions < 3.6
8080 is a pretty common port, using nodelocaldns_ip:8080 still
prevents node processes or hostNetwork=true processes to bind to *:8080
so switch to 9254 by default (prometheus port is 9253)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Use K8s 1.15
* Use Kubernetes 1.15 and use kubeadm.k8s.io/v1beta2 for
InitConfiguration.
* bump to v1.15.0
* Remove k8s 1.13 checksums.
* Update README kubernetes version 1.15.0.
* Update metrics server 0.3.3 for k8s 1.15
* Remove less than k8s 1.14 related code
* Use kubeadm with --upload-certs instead of --experimental-upload-certs due to depricate
* Update dnsautoscaler 1.6.0
* Skip certificateKey if it's not defined
* Add kubeadm-conftolplane.v2beta2 for k8s 1.15 or later
* Support kubeadm control plane for k8s 1.15
* Update sonobuoy version 0.15.0 for k8s 1.15
* Add limited containerd support
Containerd support for Ubuntu + Calico
* Added CRI-O support for ubuntu
* containerd support.
* Reset containerd support.
* fix lint.
* implemented feedback
* Change task name cri xx instead of cri-o in reset task and timeout condition.
* set crictl to fixed version
* Use docker-ce's container.io package for containerd.
* Add check containerd is installable or not.
* Avoid stop docker when use containerd and optimize retry for reset.
* Add config.toml.
* Fixed containerd for kubelet.env.
* Merge PR #4629
* Remove unused ubuntu variable for containerd
* Polish code for containerd and cri-o
* Refactoring cri socket configuration.
* Configurable conmon.
* Remove unused crictl/runc download
* Now crictl and runc is downloaded by common crictl.yml.
* fixed yamllint error
* Fixed brokenfiles by conflict.
* Remove commented line in config.toml
* Remove readded v1.12.x version
* Fixed broken set_docker_image_facts
* Fix yamllint errors.
* Remove unused apt source
* Fix crictl could not be installed
* Add containerd config from skolekonov's PR #4601
* add macvlan cni to kubespray
* macvlan: lint yaml files and fix sample config file
* macvlan: add OWNERS file
* add macvlan to README
* macvlan : CI first shoot
* macvlan : CI add full masquerade
* delegate retrive pod cidr to master only
* macvlan: add config for CI
* macvlan: add netchecker deployment
kubernetes/master role defines this value as an empty string
when using a cloud provider, not undefined. The check was updated
accordingly.
Change-Id: I58dc31ef4fd568a717a6753eb89ca687933018ae
* Require minimum version of Kubernetes
* Remove checksums for kubernetes version 1.12
* Add kube_version to precheck output and add min required version to README
* Fix merge
* Fix defaults
* Fix typo in precheck
* File and container image downloads are now cached localy, so that repeated vagrant up/down runs do not trigger downloading of those files. This is especially useful on laptops with kubernetes runnig locally on vm's. The total size of the cache, after an ansible run, is currently around 800MB, so bandwidth (=time) savings can be quite significant.
* When download_run_once is false, the default is still not to cache, but setting download_force_cache will still enable caching.
* The local cache location can be set with download_cache_dir and defaults to /tmp/kubernetes_cache
* A local docker instance is no longer required to cache docker images; Images are cached to file. A local docker instance is still required, though, if you wish to download images on localhost.
* Fixed a FIXME, wher the argument was that delegate_to doesn't play nice with omit. That is a correct observation and the fix is to use default(inventory_host) instead of default(omit). See ansible/ansible#26009
* Removed "Register docker images info" task from download_container and set_docker_image_facts because it was faulty and unused.
* Removed redundant when:download.{container,enabled,run_once} conditions from {sync,download}_container.yml
* All features of commit d6fd0d2aca by Timoses <timosesu@gmail.com>, merged May 1st 2019, are included in this patch. Not all code was included verbatim, but each feature of that commit was checked to be working in this patch. One notable change: The actual downloading of the kubeadm images was moved to {download,sync)_container, to enable caching.
Note 1: I considered splitting this patch, but most changes that are not directly related to caching, are a pleasant by-product of implementing the caching code, so splitting would be impractical.
Note 2: I have my doubts about the usefulness of the upload, download and upgrade tags in the download role. Must they remain or can they be removed? If anybody knows, then please speak up.
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
Task "kube-roter | Set cni directory permissions"
sets ownership of /opt/cni/bin to "kube"
Task "kube-router | Copy cni plugins"
copies the binaries from the archive setting the ownership
back to "root"
Fix "kube-roter" typo
Signed-off-by: Alberto Murillo <albertomurillosilva@gmail.com>
* Add support for arm images for hyperkube, kubeadm and cni_binary
* Add dummy etcd checksum for arm
This commit adds dummy etcd checksum for arm to avoid "no attribute" error
during setup.
* Add etcd host assert check
* Add 1.13.4 checksums of kubeadm and hyperkube for arm
* Update checksums of kubeadm and hyperkube for arm
* Add dummy checksums for calicoctl_binary_checksums dict
* disable gather_facts because it causes tests to fail
* Remove architecture check for etcd, due to unable to run tests
Long option --become was used in the example but in the comment describing it the short option -b was used.
Use same option in description and example to avoid confusion.
* Added pod psp in Rancher Local Path Provisioner
Added pod security policy (psp) in Rancher Local Path Provisioner.
Signed-off-by: André R. de Miranda <andre@miranda.work>
* Apply psp for Rancher Local Path Provisioner only when local_path_provisioner_namespace is not kube-system and also reorganized the templates
Kubespray waits exit of every drain before run other one.
Running drain every after each other seems better than parallel, because we should check resources availability every time.
But, this way, we have one additional problem: possible restart pods on the nodes that are killed little bit later.
Fast cordon before heavy drain seems like an easy solution.
Error starting nginx because in requiredDropCapabilities is dropped all capabilities.
The nginx requires the following capabilities:
- CHOWN
- SETGID
- SETUID
Signed-off-by: André R. de Miranda <andre@miranda.work>
* updated ansible pinning to prevent more possibilities of breaking changes
* more exact pinning of ansible version
* more exact pinning of ansible version and also all the rest
* added testing requirements.txt pinning settings
* removed boto from testing requirements.txt
Without this, pulls are considered for all
hosts groups, even if not targetted by the downloads
`groups` list. Hence, a download/sync is triggered
even though the host does not require the image.
* Disable kube_api_anonymous_auth by default to secure the setup
* Disable metrics-server in addons. Health endpoint is slow and unstable
* Fix anonymous-auth missing in configuration
* Cleanup a bit
* Fix kube anon auth
* Download to delegate and sync files when download_run_once
* Fail on error after saving container image
* Do not set changed status when downloaded container was up to date
* Only sync containers when they are actually required
Previously, non-required images (pull_required=false as
image existed on target host) were synced to the target
hosts. This failed as the image was not downloaded to
the download_delegate and hence was not available for
syncing.
* Sync containers when only missing on some hosts
* Consider images with multiple repo tags
* Enable kubeadm images pull/syncing with download_delegate
* Use kubeadm images list to pull/sync
'kubeadm config images pull' is replaced by collecting the images
list with 'kubeadm config images list' and using the commonly
used method of pull/syncing the images.
* Ensure containers are downloaded and synced for all hosts
* Fix download/syncing when download_delegate is a kubernetes host
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://git.k8s.io/community/contributors/guide#your-first-contribution and developer guide https://git.k8s.io/community/contributors/devel/development.md#development-guide
1. If this is your first time, please read our contributor guidelines: https://git.k8s.io/community/contributors/guide/first-contribution.md and developer guide https://git.k8s.io/community/contributors/devel/development.md
2. Please label this pull request according to what type of issue you are addressing, especially if this is a release targeted pull request. For reference on required PR/issue labels, read here:
## How to become a contributor and submit your own code
### Environment setup
It is recommended to use filter to manage the GitHub email notification, see [examples for setting filters to Kubernetes Github notifications](https://github.com/kubernetes/community/blob/master/communication/best-practices.md#examples-for-setting-filters-to-kubernetes-github-notifications)
To install development dependencies you can use `pip install -r tests/requirements.txt`
#### Linting
Kubespray uses `yamllint` and `ansible-lint`. To run them locally use `yamllint .` and `./tests/scripts/ansible-lint.sh`
#### Molecule
[molecule](https://github.com/ansible-community/molecule) is designed to help the development and testing of Ansible roles. In Kubespray you can run it all for all roles with `./tests/scripts/molecule_run.sh` or for a specific role (that you are working with) with `molecule test` from the role directory (`cd roles/my-role`).
When developing or debugging a role it can be useful to run `molecule create` and `molecule converge` separately. Then you can use `molecule login` to SSH into the test environment.
#### Vagrant
Vagrant with VirtualBox or libvirt driver helps you to quickly spin test clusters to test things end to end. See [README.md#vagrant](README.md)
### Contributing A Patch
1. Submit an issue describing your proposed change to the repo in question.
2. The [repo owners](OWNERS) will respond to your issue promptly.
3. Fork the desired repo, develop and test your code changes.
4. Sign the CNCF CLA (https://git.k8s.io/community/CLA.md#the-contributor-license-agreement)
4. Sign the CNCF CLA (<https://git.k8s.io/community/CLA.md#the-contributor-license-agreement>)
If you have questions, check the [documentation](https://kubespray.io) and join us on the [kubernetes slack](https://kubernetes.slack.com), channel **\#kubespray**.
If you have questions, check the documentation at [kubespray.io](https://kubespray.io) and join us on the [kubernetes slack](https://kubernetes.slack.com), channel **\#kubespray**.
You can get your invite [here](http://slack.k8s.io/)
- Can be deployed on **AWS, GCE, Azure, OpenStack, vSphere, Packet (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal**
- Can be deployed on **[AWS](docs/aws.md), GCE, [Azure](docs/azure.md), [OpenStack](docs/openstack.md), [vSphere](docs/vsphere.md), [Packet](docs/packet.md) (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal**
- **Highly available** cluster
- **Composable** (Choice of the network plugin for instance)
- Supports most popular **Linux distributions**
- **Continuous integration tests**
Quick Start
-----------
## Quick Start
To deploy the cluster you can use :
@ -21,31 +19,35 @@ To deploy the cluster you can use :
#### Usage
```ShellSession
# Install dependencies from ``requirements.txt``
sudo pip install -r requirements.txt
sudo pip3 install -r requirements.txt
# Copy ``inventory/sample`` as ``inventory/mycluster``
cp -rfp inventory/sample inventory/mycluster
# Update Ansible inventory file with inventory builder
Note: When Ansible is already installed via system packages on the control machine, other python packages installed via `sudo pip install -r requirements.txt` will go to a different directory tree (e.g. `/usr/local/lib/python2.7/dist-packages` on Ubuntu) from Ansible's (e.g. `/usr/lib/python2.7/dist-packages/ansible` still on Ubuntu).
As a consequence, `ansible-playbook` command will fail with:
```
```raw
ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.
```
probably pointing on a task depending on a module present in requirements.txt (i.e. "unseal vault").
One way of solving this would be to uninstall the Ansible package and then, to install it via pip but it is not always possible.
@ -56,20 +58,24 @@ A workaround consists of setting `ANSIBLE_LIBRARY` and `ANSIBLE_MODULE_UTILS` en
For Vagrant we need to install python dependencies for provisioning tasks.
Check if Python and pip are installed:
```ShellSession
python -V && pip -V
```
If this returns the version of the software, you're good to go. If not, download and install Python from here <https://www.python.org/downloads/source/>
Install the necessary requirements
```ShellSession
sudo pip install -r requirements.txt
vagrant up
```
Documents
---------
## Documents
- [Requirements](#requirements)
- [Kubespray vs ...](docs/comparisons.md)
- [Getting started](docs/getting-started.md)
- [Setting up your first cluster](docs/setting-up-your-first-cluster.md)
- [Ansible inventory and tags](docs/ansible.md)
- [Integration with existing ansible repo](docs/integration.md)
- [Deployment data variables](docs/vars.md)
@ -77,7 +83,8 @@ Documents
- [HA mode](docs/ha-mode.md)
- [Network plugins](#network-plugins)
- [Vagrant install](docs/vagrant.md)
- [CoreOS bootstrap](docs/coreos.md)
- [Flatcar Container Linux bootstrap](docs/flatcar.md)
Note: The list of validated [docker versions](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md) was updated to 1.11.1, 1.12.1, 1.13.1, 17.03, 17.06, 17.09, 18.06. kubeadm now properly recognizes Docker 18.09.0 and newer, but still treats 18.06 as the default supported version. The kubelet might break on docker's non-standard version numbering (it no longer uses semantic versioning). To ensure auto-updates don't break your cluster look into e.g. yum versionlock plugin or apt pin).
Note: The list of validated [docker versions](https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker) is 1.13.1, 17.03, 17.06, 17.09, 18.06, 18.09 and 19.03. The recommended docker version is 19.03. The kubelet might break on docker's non-standard version numbering (it no longer uses semantic versioning). To ensure auto-updates don't break your cluster look into e.g. yum versionlock plugin or apt pin).
Requirements
------------
## Requirements
- **Ansible v2.7.8 (or newer) and python-netaddr is installed on the machine
that will run Ansible commands**
- **Jinja 2.9 (or newer) is required to run the Ansible Playbooks**
- The target servers must have **access to the Internet** in order to pull docker images. Otherwise, additional configuration is required (See [Offline Environment](https://github.com/kubernetes-sigs/kubespray/blob/master/docs/downloads.md#offline-environment))
- **Minimum required version of Kubernetes is v1.17**
- **Ansible v2.9+, Jinja 2.11+ and python-netaddr is installed on the machine that will run Ansible commands**
- The target servers must have **access to the Internet** in order to pull docker images. Otherwise, additional configuration is required (See [Offline Environment](docs/offline-environment.md))
- The target servers are configured to allow **IPv4 forwarding**.
- **Your ssh key must be copied** to all the servers part of your inventory.
- The **firewalls are not managed**, you'll need to implement your own rules the way you used to.
@ -153,14 +164,16 @@ These limits are safe guarded by Kubespray. Actual requirements for your workloa
- Node
- Memory: 1024 MB
Network Plugins
---------------
## Network Plugins
You can choose between 6 network plugins. (default: `calico`, except Vagrant uses `flannel`)
You can choose between 10 network plugins. (default: `calico`, except Vagrant uses `flannel`)
- [Calico](https://docs.projectcalico.org/latest/introduction/) is a networking and network policy provider. Calico supports a flexible set of networking options
designed to give you the most efficient networking across a range of situations, including non-overlay
and overlay networks, with or without BGP. Calico uses the same engine to enforce network policy for hosts,
pods, and (if using Istio and Envoy) applications at the service mesh layer.
- [canal](https://github.com/projectcalico/canal): a composition of calico and flannel plugins.
@ -169,38 +182,48 @@ You can choose between 6 network plugins. (default: `calico`, except Vagrant use
- [contiv](docs/contiv.md): supports vlan, vxlan, bgp and Cisco SDN networking. This plugin is able to
apply firewall policies, segregate containers in multiple network and bridging pods onto physical networks.
- [ovn4nfv](docs/ovn4nfv.md): [ovn4nfv-k8s-plugins](https://github.com/opnfv/ovn4nfv-k8s-plugin) is the network controller, OVS agent and CNI server to offer basic SFC and OVN overlay networking.
- [weave](docs/weave.md): Weave is a lightweight container overlay network that doesn't require an external K/V database cluster.
(Please refer to `weave` [troubleshooting documentation](http://docs.weave.works/weave/latest_release/troubleshooting.html)).
(Please refer to `weave` [troubleshooting documentation](https://www.weave.works/docs/net/latest/troubleshooting/)).
- [kube-ovn](docs/kube-ovn.md): Kube-OVN integrates the OVN-based Network Virtualization with Kubernetes. It offers an advanced Container Network Fabric for Enterprises.
- [kube-router](docs/kube-router.md): Kube-router is a L3 CNI for Kubernetes networking aiming to provide operational
simplicity and high performance: it uses IPVS to provide Kube Services Proxy (if setup to replace kube-proxy),
iptables for network policies, and BGP for ods L3 networking (with optionally BGP peering with out-of-cluster BGP peers).
It can also optionally advertise routes to Kubernetes cluster Pods CIDRs, ClusterIPs, ExternalIPs and LoadBalancerIPs.
- [macvlan](docs/macvlan.md): Macvlan is a Linux network driver. Pods have their own unique Mac and Ip address, connected directly the physical (layer 2) network.
- [multus](docs/multus.md): Multus is a meta CNI plugin that provides multiple network interface support to pods. For each interface Multus delegates CNI calls to secondary CNI plugins such as Calico, macvlan, etc.
The choice is defined with the variable `kube_network_plugin`. There is also an
option to leverage built-in cloud provider networking instead.
The Kubespray Project is released on an as-needed basis. The process is as follows:
1. An issue is proposing a new release with a changelog since the last release
2. At least one of the [OWNERS](OWNERS) must LGTM this release
3.An OWNER runs `git tag -s $VERSION` and inserts the changelog and pushes the tag with `git push $VERSION`
4.The release issue is closed
5. An announcement email is sent to `kubernetes-dev@googlegroups.com` with the subject `[ANNOUNCE] Kubespray $VERSION is released`
2. At least one of the [approvers](OWNERS_ALIASES) must approve this release
3.The `kube_version_min_required` variable is set to `n-1`
4.Remove hashes for [EOL versions](https://github.com/kubernetes/sig-release/blob/master/releases/patch-releases.md) of kubernetes from `*_checksums` variables.
5. An approver creates [new release in GitHub](https://github.com/kubernetes-sigs/kubespray/releases/new) using a version and tag name like `vX.Y.Z` and attaching the release notes
6. An approver creates a release branch in the form `release-X.Y`
7. The corresponding version of [quay.io/kubespray/kubespray:vX.Y.Z](https://quay.io/repository/kubespray/kubespray) and [quay.io/kubespray/vagrant:vX.Y.Z](https://quay.io/repository/kubespray/vagrant) docker images are built and tagged
8. The `KUBESPRAY_VERSION` variable is updated in `.gitlab-ci.yml`
9. The release issue is closed
10. An announcement email is sent to `kubernetes-dev@googlegroups.com` with the subject `[ANNOUNCE] Kubespray $VERSION is released`
11. The topic of the #kubespray channel is updated with `vX.Y.Z is released! | ...`
## Major/minor releases, merge freezes and milestones
## Major/minor releases and milestones
* Kubespray does not maintain stable branches for releases. Releases are tags, not
branches, and there are no backports. Therefore, there is no need for merge
freezes as well.
* For major releases (vX.Y) Kubespray maintains one branch (`release-X.Y`). Minor releases (vX.Y.Z) are available only as tags.
*Fixes for major releases (vX.x.0) and minor releases (vX.Y.x) are delivered
*Security patches and bugs might be backported.
* Fixes for major releases (vX.Y) and minor releases (vX.Y.Z) are delivered
via maintenance releases (vX.Y.Z) and assigned to the corresponding open
milestone (vX.Y). That milestone remains open for the major/minor releases
support lifetime, which ends once the milestone closed. Then only a next major
# node that can be used to access the masters and minions
use_bastion: false
# Set this to a prefered name that will be used as the first part of the dns name for your bastotion host. For example: k8s-bastion.<azureregion>.cloudapp.azure.com.
# Set this to a preferred name that will be used as the first part of the dns name for your bastotion host. For example: k8s-bastion.<azureregion>.cloudapp.azure.com.
# This is convenient when exceptions have to be configured on a firewall to allow ssh to the given bastion host.
# Running systemd-machine-id-setup doesn't create a unique id for each node container on Debian,
# handle manually
- name:Re-create unique machine-id (as we may just get what comes in the docker image), needed by some CNIs for mac address seeding (notably weave)
- name:Re-create unique machine-id (as we may just get what comes in the docker image), needed by some CNIs for mac address seeding (notably weave)# noqa 301
MetalLB hooks into your Kubernetes cluster, and provides a network load-balancer implementation. In short, it allows you to create Kubernetes services of type “LoadBalancer” in clusters that don’t run on a cloud provider, and thus cannot simply hook into paid products to provide load-balancers.
```
This playbook aims to automate [this](https://metallb.universe.tf/tutorial/layer2/tutorial). It deploys MetalLB into kubernetes and sets up a layer 2 loadbalancer.
@ -21,7 +21,7 @@ You can specify a `default_release` for apt on Debian/Ubuntu by overriding this
glusterfs_ppa_use: yes
glusterfs_ppa_version: "3.5"
For Ubuntu, specify whether to use the official Gluster PPA, and which version of the PPA to use. See Gluster's [Getting Started Guide](http://www.gluster.org/community/documentation/index.php/Getting_started_install) for more info.
For Ubuntu, specify whether to use the official Gluster PPA, and which version of the PPA to use. See Gluster's [Getting Started Guide](https://docs.gluster.org/en/latest/Quick-Start-Guide/Quickstart/) for more info.
cluster:"{% for item in groups['gfs-cluster'] -%}{{ hostvars[item]['ip']|default(hostvars[item].ansible_default_ipv4['address']) }}{% if not loop.last %},{% endif %}{%- endfor %}"
- Update `contrib/terraform/aws/terraform.tfvars` with your data. By default, the Terraform scripts use CoreOS as base image. If you want to change this behaviour, see note "Using other distrib than CoreOs" below.
- Update `contrib/terraform/aws/terraform.tfvars` with your data. By default, the Terraform scripts use Ubuntu 18.04 LTS (Bionic) as base image. If you want to change this behaviour, see note "Using other distrib than Ubuntu" below.
- Create an AWS EC2 SSH Key
- Run with `terraform apply --var-file="credentials.tfvars"` or `terraform apply` depending if you exported your AWS credentials
If you want to use another distribution than CoreOS, you can modify the search filters of the 'data "aws_ami" "distro"' in variables.tf.
***Using other distrib than Ubuntu***
If you want to use another distribution than Ubuntu 18.04 (Bionic) LTS, you can modify the search filters of the 'data "aws_ami" "distro"' in variables.tf.
For example, to use:
- Debian Jessie, replace 'data "aws_ami" "distro"' in variables.tf with
@ -220,12 +229,14 @@ set OS_PROJECT_DOMAIN_NAME=Default
The construction of the cluster is driven by values found in
[variables.tf](variables.tf).
For your cluster, edit `inventory/$CLUSTER/cluster.tf`.
For your cluster, edit `inventory/$CLUSTER/cluster.tfvars`.
|Variable | Description |
|---------|-------------|
|`cluster_name` | All OpenStack resources will use the Terraform variable`cluster_name` (default`example`) in their name to make it easier to track. For example the first compute resource will be named`example-kubernetes-1`. |
|`az_list` | List of Availability Zones available in your OpenStack cluster. |
|`network_name` | The name to be given to the internal network that will be generated |
|`network_dns_domain` | (Optional) The dns_domain for the internal network that will be generated |
|`dns_nameservers`| An array of DNS name server names to be used by hosts in the internal subnet. |
|`floatingip_pool` | Name of the pool from which floating IPs will be allocated |
|`external_net` | UUID of the external network that will be routed to |
@ -246,6 +257,114 @@ For your cluster, edit `inventory/$CLUSTER/cluster.tf`.
|`master_allowed_remote_ips` | List of CIDR blocks allowed to initiate an API connection, `["0.0.0.0/0"]` by default |
|`k8s_allowed_remote_ips` | List of CIDR allowed to initiate a SSH connection, empty by default |
|`worker_allowed_ports` | List of ports to open on worker nodes, `[{ "protocol" = "tcp", "port_range_min" = 30000, "port_range_max" = 32767, "remote_ip_prefix" = "0.0.0.0/0"}]` by default |
|`wait_for_floatingip` | Let Terraform poll the instance until the floating IP has been associated, `false` by default. |
|`node_root_volume_size_in_gb` | Size of the root volume for nodes, 0 to use ephemeral storage |
|`master_root_volume_size_in_gb` | Size of the root volume for masters, 0 to use ephemeral storage |
|`gfs_root_volume_size_in_gb` | Size of the root volume for gluster, 0 to use ephemeral storage |
|`etcd_root_volume_size_in_gb` | Size of the root volume for etcd nodes, 0 to use ephemeral storage |
|`bastion_root_volume_size_in_gb` | Size of the root volume for bastions, 0 to use ephemeral storage |
|`use_server_group` | Create and use openstack nova servergroups, default: false |
If you've started the Ansible run, it may also be a good idea to do some manual cleanup:
@ -325,6 +444,30 @@ $ ssh-add ~/.ssh/id_rsa
If you have deployed and destroyed a previous iteration of your cluster, you will need to clear out any stale keys from your SSH "known hosts" file ( `~/.ssh/known_hosts`).
#### Metadata variables
The [python script](../terraform.py) that reads the
generated`.tfstate` file to generate a dynamic inventory recognizes
some variables within a "metadata" block, defined in a "resource"
As the example shows, these let you define the SSH username for
Ansible, a Python binary which is needed by Ansible if
`/usr/bin/python` doesn't exist, and whether the IPv6 address of the
instance should be preferred over IPv4.
#### Bastion host
Bastion access will be determined by:
@ -339,7 +482,7 @@ So, either a bastion host, or at least master/node with a floating IP are requir
#### Test access
Make sure you can connect to the hosts. Note that Container Linux by CoreOS will have a state `FAILED` due to Python not being present. This is okay, because Python will be installed during bootstrapping, so long as the hosts are not `UNREACHABLE`.
Make sure you can connect to the hosts. Note that Flatcar Container Linux by Kinvolk will have a state `FAILED` due to Python not being present. This is okay, because Python will be installed during bootstrapping, so long as the hosts are not `UNREACHABLE`.
- Set max amount of attached cinder volume per host (default 256)
```
node_volume_attach_limit: 26
```
- Disable access_ip, this will make all innternal cluster traffic to be sent over local network when a floating IP is attached (default this value is set to 1)
Try out your new Kubernetes cluster with the [Hello Kubernetes service](https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/).
## Appendix
### Migration from `number_of_k8s_nodes*` to `k8s_nodes`
If you currently have a cluster defined using the `number_of_k8s_nodes*` variables and wish
to migrate to the `k8s_nodes` style you can do it like so:
$ terraform state mv 'module.compute.openstack_compute_floatingip_associate_v2.k8s_node[0]' 'module.compute.openstack_compute_floatingip_associate_v2.k8s_nodes["1"]'
Move "module.compute.openstack_compute_floatingip_associate_v2.k8s_node[0]" to "module.compute.openstack_compute_floatingip_associate_v2.k8s_nodes[\"1\"]"
Successfully moved 1 object(s).
$ terraform state mv 'module.compute.openstack_compute_floatingip_associate_v2.k8s_node[1]' 'module.compute.openstack_compute_floatingip_associate_v2.k8s_nodes["2"]'
Move "module.compute.openstack_compute_floatingip_associate_v2.k8s_node[1]" to "module.compute.openstack_compute_floatingip_associate_v2.k8s_nodes[\"2\"]"
Successfully moved 1 object(s).
$ terraform state mv 'module.compute.openstack_compute_floatingip_associate_v2.k8s_node[2]' 'module.compute.openstack_compute_floatingip_associate_v2.k8s_nodes["3"]'
Move "module.compute.openstack_compute_floatingip_associate_v2.k8s_node[2]" to "module.compute.openstack_compute_floatingip_associate_v2.k8s_nodes[\"3\"]"
Successfully moved 1 object(s).
$ terraform state mv 'module.compute.openstack_compute_instance_v2.k8s_node[0]' 'module.compute.openstack_compute_instance_v2.k8s_node["1"]'
Move "module.compute.openstack_compute_instance_v2.k8s_node[0]" to "module.compute.openstack_compute_instance_v2.k8s_node[\"1\"]"
Successfully moved 1 object(s).
$ terraform state mv 'module.compute.openstack_compute_instance_v2.k8s_node[1]' 'module.compute.openstack_compute_instance_v2.k8s_node["2"]'
Move "module.compute.openstack_compute_instance_v2.k8s_node[1]" to "module.compute.openstack_compute_instance_v2.k8s_node[\"2\"]"
Successfully moved 1 object(s).
$ terraform state mv 'module.compute.openstack_compute_instance_v2.k8s_node[2]' 'module.compute.openstack_compute_instance_v2.k8s_node["3"]'
Move "module.compute.openstack_compute_instance_v2.k8s_node[2]" to "module.compute.openstack_compute_instance_v2.k8s_node[\"3\"]"
Successfully moved 1 object(s).
$ terraform state mv 'module.ips.openstack_networking_floatingip_v2.k8s_node[0]' 'module.ips.openstack_networking_floatingip_v2.k8s_node["1"]'
Move "module.ips.openstack_networking_floatingip_v2.k8s_node[0]" to "module.ips.openstack_networking_floatingip_v2.k8s_node[\"1\"]"
Successfully moved 1 object(s).
$ terraform state mv 'module.ips.openstack_networking_floatingip_v2.k8s_node[1]' 'module.ips.openstack_networking_floatingip_v2.k8s_node["2"]'
Move "module.ips.openstack_networking_floatingip_v2.k8s_node[1]" to "module.ips.openstack_networking_floatingip_v2.k8s_node[\"2\"]"
Successfully moved 1 object(s).
$ terraform state mv 'module.ips.openstack_networking_floatingip_v2.k8s_node[2]' 'module.ips.openstack_networking_floatingip_v2.k8s_node["3"]'
Move "module.ips.openstack_networking_floatingip_v2.k8s_node[2]" to "module.ips.openstack_networking_floatingip_v2.k8s_node[\"3\"]"
Successfully moved 1 object(s).
```
Of course for nodes without floating ips those steps can be omitted.
An SSH keypair is required so Ansible can access the newly provisioned nodes (bare metal Packet hosts). By default, the public SSH key defined in cluster.tf will be installed in authorized_key on the newly provisioned nodes (~/.ssh/id_rsa.pub). Terraform will upload this public key and then it will be distributed out to all the nodes. If you have already set this public key in Packet (i.e. via the portal), then set the public keyfile name in cluster.tf to blank to prevent the duplicate key from being uploaded which will cause an error.
An SSH keypair is required so Ansible can access the newly provisioned nodes (bare metal Packet hosts). By default, the public SSH key defined in cluster.tfvars will be installed in authorized_key on the newly provisioned nodes (~/.ssh/id_rsa.pub). Terraform will upload this public key and then it will be distributed out to all the nodes. If you have already set this public key in Packet (i.e. via the portal), then set the public keyfile name in cluster.tfvars to blank to prevent the duplicate key from being uploaded which will cause an error.
If you don't already have a keypair generated (~/.ssh/id_rsa and ~/.ssh/id_rsa.pub), then a new keypair can be generated with the command:
@ -72,7 +72,7 @@ If someone gets this key, they can startup/shutdown hosts in your project!
For more information on how to generate an API key or find your project ID, please see:
If you've started the Ansible run, it may also be a good idea to do some manual cleanup:
@ -176,7 +176,7 @@ If you have deployed and destroyed a previous iteration of your cluster, you wil
#### Test access
Make sure you can connect to the hosts. Note that Container Linux by CoreOS will have a state `FAILED` due to Python not being present. This is okay, because Python will be installed during bootstrapping, so long as the hosts are not `UNREACHABLE`.
Make sure you can connect to the hosts. Note that Flatcar Container Linux by Kinvolk will have a state `FAILED` due to Python not being present. This is okay, because Python will be installed during bootstrapping, so long as the hosts are not `UNREACHABLE`.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.