Details
-
Type: Task
-
Status: Closed (View Workflow)
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:1.5
-
Epic Link:
-
Sprint:Summer 1: 8 Jun - 19 Jun, Summer 2: 22 Jun - 3 Jul, Summer 3: 6 Jul - 17 Jul, Summer 4: 14 Jul - 28 Jul, Summer 5: 3 Aug - 14 Aug, Summer 6: 17 Aug - 28 Aug, Summer 7: 31 Aug - 11 Sep, Fall 1: 14 Sep - 25 Sep, Fall 2: 28 Sep - 9 Oct
Description
When we stand up a new appstore with all new code, we often want to copy data (msyql database and s3 bucket contents) to a new database and new s3 bucket to be used by the new appstore.
Currently, we do not have a good way to copy the s3 bucket contents. (We recently developed code for copying the mysql database, however - see IGBF-2421.)
For this task, we'll write some code that copies data from an existing s3 bucket into a newly created one for the new ec2 to use.
Let's add some new tasks that accomplish this goal to the end of main.yml in role S3 in appstore playbooks.
The new tasks should check to see if a variable first_s3_bucket_name exists. This is to ensure that if the user is not actually trying to copy over some S3 contents, the tasks will not run.
If variable first_s3_bucket_name exists and first_s3_bucket_name does not equal s3_bucket_name and s3_bucket_name is empty, copy the contents of first_s3_bucket_name into s3_bucket_name.
Note: I'm entirely sure that the mysql database contains references to particular s3 buckets or if it is using relative paths of some type. Someone needs to check this. If it contains references to the s3 bucket name, we will need to add some extra logic to deal with that.
Attachments
Issue Links
- blocks
-
IGBF-2426 Write next draft database migration code
- Closed
Activity
Hi Chester Dias! Can you take this one next?
Sure. Will get started immediately
I have created a new role for copying one s3 content to another bucket.
Code Flow
1. Check if the source and destination bucket names are not the same.
2. Check if both buckets are present.
3. Check to ensure the destination bucket (s3_bucket_name) is empty before copying.
4. Copy the content from first_s3_bucket_name to s3_bucket_name using the shell aws-cli commands.
Note:
1. The control node Ec2 needs to have an S3 role attached and 'aws-cli' installed this might be a change needed to control-node playbooks.
2. There isn't any module for copying s3 bucket content from one s3 to another. The normal module-based approach was to download the content to ec2.
3. The approach followed, in this case, is using the Aws-cli for simple sync command to copy content from one s3 to another.
Please Review: https://bitbucket.org/chesterdias/chester-local-appstore-playbooks/branch/IGBF-2427#diff
Please add the tasks to the existing s3 role.
Please don't try to get a listing of all buckets in the account. Instead, assume the previous tasks have run and that s3_bucket_name exists.
Clarifying:
We want to ensure that if the user (me) has defined a variable first_s3_bucket_name, then the playbook will copy the contents of first_s3_bucket_name over to s3_bucket_name, which was created and configured in the previous tasks.
However, we only want that to happen once. Unfortunately, because we are using aws cli directly, we have to created our own idempotency here. Check whether or not the s3_bucket_name is empty before running the aws cli command seems like a good way to do this.
ok will move these tasks to existing s3 role and remove the check.
I have moved the content of this role, to role s3 and removed the listing of all buckets, instead running list of objects to check if first_s3_bucket_name bucket is present or not.
if it is present we copy the content to s3_bucket_name if it is empty
Instead of fail, use predicates or other approach to ensure that the subsequent aws cli task will not run when the target S3 bucket already has content.
fail:
msg: Destination Bucket already has content....Please verify.
when: destination_content.s3_keys|string !="[]"
I have removed the fail module and moved the logic into copy task so that it copies only if the destination bucket is empty.
Minor change request:
Use "first_s3_bucket_name" instead of "first_bucket_name" as the name of the s3 to bucket to copied.
Done
Thank you! Please submit PR when convenient.
cc: Chester Dias
Thanks for the PR. Merged.
Playbook fails with the following error:
TASK [s3 : Copy source to target if target is empty] ************************************************ fatal: [localhost]: FAILED! => changed=true cmd: aws s3 sync s3://"testappstore-xyz" s3://"testappstore3-xyz" delta: '0:00:00.817492' end: '2020-07-26 02:58:57.455672' msg: non-zero return code rc: 1 start: '2020-07-26 02:58:56.638180' stderr: 'fatal error: Unable to locate credentials' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted>
Please fix by updating:
- control_node_install
which installs software on control node. Looks like the error is coming from the control node not having its local aws cli configured with the expected credentials.
To setup control node in AWS, run:
ansible-playbook control_node.yml
after creating secrets.yml and common.yml.
The error occurs due to a missing S3 Role associated to the control node. I will make the required changes to the control node creation playbooks.
Making changes to control node grants the control node full admin access. Instead, the approach used is temporarily granting a high 's3 only' privileges using IAM roles to the control node during copy and removing the privileges after completion of the copy operation.
No full access credentials are being configured on the control node's aws cli. Once the role is attached, this will grant temporary S3 operation capabilities with some temporary creds generated by AWS.
Please Review: https://bitbucket.org/chesterdias/chester-local-appstore-playbooks/branch/IGBF-2427#diff
Change request:
- Only do the new tasks (such as creating and destroying the temporary IAM role for the Control Node) when it is necessary.
For example, if the S3 bucket has already been copied, then don't create the new role. Also, if there is no S3 node to actually copy, then don't create the new role. Please carefully review the logic in this playbook to make sure that these new tasks are only going to be run when absolutely necessary.
Changes made as requested
Please Review: https://bitbucket.org/chesterdias/chester-local-appstore-playbooks/branch/IGBF-2427#diff
There is a conflict with one of the files - please rebase on the latest master branch to assess and resolve.
I tried rebasing please check now
Thanks!
Please see comment on PR for small change request.
I am getting the following error when I attempt to run the playbooks:
TASK [s3 : Create IAM Managed Policy] ****************************************** ok: [localhost] TASK [s3 : List keys from Source Bucket to ensure it is present] *************** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: expected string or bytes-like object fatal: [localhost]: FAILED! => changed=false module_stderr: |- Traceback (most recent call last): File "/home/ec2-user/.ansible/tmp/ansible-tmp-1597866752.6391451-24927-39053629397825/AnsiballZ_aws_s3.py", line 102, in <module> _ansiballz_main() File "/home/ec2-user/.ansible/tmp/ansible-tmp-1597866752.6391451-24927-39053629397825/AnsiballZ_aws_s3.py", line 94, in _ansiballz_main invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS) File "/home/ec2-user/.ansible/tmp/ansible-tmp-1597866752.6391451-24927-39053629397825/AnsiballZ_aws_s3.py", line 40, in invoke_module runpy.run_module(mod_name='ansible.modules.cloud.amazon.aws_s3', init_globals=None, run_name='__main__', alter_sys=True) File "/usr/lib64/python3.7/runpy.py", line 205, in run_module return _run_module_code(code, init_globals, run_name, mod_spec) File "/usr/lib64/python3.7/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/tmp/ansible_aws_s3_payload_dcohcj7g/ansible_aws_s3_payload.zip/ansible/modules/cloud/amazon/aws_s3.py", line 915, in <module> File "/tmp/ansible_aws_s3_payload_dcohcj7g/ansible_aws_s3_payload.zip/ansible/modules/cloud/amazon/aws_s3.py", line 772, in main File "/tmp/ansible_aws_s3_payload_dcohcj7g/ansible_aws_s3_payload.zip/ansible/modules/cloud/amazon/aws_s3.py", line 363, in bucket_check File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 316, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 608, in _make_api_call api_params, operation_model, context=request_context) File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 654, in _convert_to_request_dict api_params, operation_model, context) File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 686, in _emit_api_params params=api_params, model=operation_model, context=context) File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit return self._emit(event_name, kwargs) File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit response = handler(**kwargs) File "/usr/local/lib/python3.7/site-packages/botocore/handlers.py", line 200, in validate_bucket_name if not VALID_BUCKET.search(bucket) and not VALID_S3_ARN.search(bucket): TypeError: expected string or bytes-like object module_stdout: '' msg: |- MODULE FAILURE See stdout/stderr for the exact error rc: 1 PLAY RECAP ********************************************************************* localhost : ok=7 changed=0 unreachable=0 failed=1 skipped=4 rescued=0 ignored=0
Chester Dias - can you please take a look?
The above code was run to set up and configure a new appstore, without a pre-existing s3 bucket to copy from. That is, variable "first_s3_bucket_name" and "first _rds_db_name" and "first_db_instance_id" variables were not set.
common.yml looked like this:
### Variables used by setup.yml ### # # # You must set this variable: # There should only ever be one - ec2_name must be unique for # all EC2's owned by your AWS account. # # Also, this variable must be a legal name for a mysql database. # The current default setting (see below) assigns the same name to # the EC2 and its companion mysql database. # # See: http://dev.csx.ovgu.de/db/mysql/Legal-names.html # ec2_name: devappstore1 # You must set this variable: # This is the subnet id for the EC2 (required to create an EC2). # You have to tell AWS which network the EC2 will belong to. vpc_subnet_id: subnet-16b3884e # You must set this variable: # Here indicate the name of the RDS host where the Dev AppStore database # will reside. Note that it will likely be shared with other Dev # AppStore databases! This is done to save expense because Amazon # charges for RDS host uptime but not for individual databases within # a host. # # If creating the RDS host for the first time, you need to pick a name that # is unique for all RDS hosts (DB instances) owned by your AWS account in the # current region. # # DB instance identifier is case insensitive, but stored as all lower-case, # as in "mydbinstance". # db_instance_id: devappstore # You must set this variable: # Here list the email addresses that get notified when somebody submits # an App. # CONTACT_EMAILS: chirag24@uab.edu,aloraine@uncc.edu # You must set this variable: # App Store host name. This will be registered in DNS externally (and # manually) after these playbooks are run. # For the core IGB team, this will be a sub-domain of bioviz.org, the # default value below. # ServerName: "{{ ec2_name }}.bioviz.org" # # If copying data from another AppStore, RDS instance id hosting the # AppStore's database, the name of the database, and the name of the # S3 bucket used to store its digital assets. # first_db_instance_id: first_rds_db_name: first_s3_bucket_name: # You can safely ignore the remaining variables below. The default values # are fine for Loraine Lab users. ec2_region: us-east-1 # Develop or Prod stack: Develop # RDS host region rds_region: "{{ ec2_region }}" # Name of the MySQL database to be used by AppStore rds_db_name: "{{ ec2_name }}" # User name and password to be used by the AppStore to read/write # to its personal database. rds_db_user: "{{ ec2_name }}" rds_db_pass: "{{ ec2_name }}" # Each App Store gets its own dedicated S3 bucket # for storing Apps and other digital assets. s3_bucket_name: "{{ ec2_name }}-xyz" s3_region: "{{ ec2_region }}"
Also, the s3 bucket for this new appstore instance has already been created during a previous run of the playbook. It is empty.
I missed adding a condition while copying the code. I have added that now. Please check.
PR: https://bitbucket.org/lorainelab/appstore-playbooks/pull-requests/33/igbf-2427-added-conditional-check-for/diff
Merged and ready for testing.
To test:
- Stand up an app store in your AWS account
- Check that you can use a non-bioviz domain and SSL certificate (so that organizations can stand up their own app stores independently from the core IGB team)
- Submit a test App
- Then try to stand up a second app store that clones the first App Store's data
- Submit the same test App, same version to the second App Store and observe that App Store recognizes it and does not allow re-submission of same.
Question for Chester Dias:
How is variable "ansible_ec2_instance_id" getting defined in
I don't see how this play can run because the ansible_ec2_instance_id does not appear to be set anywhere.
Answer to above question:
This line does it:
- ec2_metadata_facts:
copy is failing:
TASK [copy_s3 : Copy source bucket contents to destination (new) bucket] ***************************** fatal: [localhost]: FAILED! => changed=true cmd: aws s3 sync s3://testappstore-xyz s3://devappstore2-xyz delta: '0:00:00.595888' end: '2020-09-25 11:39:47.809347' msg: non-zero return code rc: 1 start: '2020-09-25 11:39:47.213459' stderr: 'fatal error: An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The AWS Access Key Id you provided does not exist in our records.' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted>
As noted in meeting today, attaching new IAM roles to ec2's is not working as expected.
Am going to try this instead:
- When creating the control node, I will assign it an IAM role and never delete this role
- When needed, I will temporarily attach the s3 "copy" policy and add it to the role
cc: Chester Dias
Above strategy worked. Needed to pause after attaching new policy to control node role.
Tested by standing up all-new appstore "devappstore3"
Got error on first try:
TASK [ec2 : Create devappstore3 if does not exist] **************************************************************** fatal: [localhost]: FAILED! => changed=false msg: 'Instance creation failed => InvalidParameterValue: Value (devappstore3) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name' PLAY RECAP ******************************************************************************************************** localhost : ok=19 changed=10 unreachable=0 failed=1 skipped=4 rescued=0 ignored=0
Worked on second try.
Tested replication of existing database and s3 bucket. Works. Moving to closed.
Suggestion: Look at Ansible Galaxy for code we can import that does this. I bet some-one has written something to do this. Basically, the job is to mirror an s3 bucket. Surely this has been written already!