Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2686

Add creation of BioViz role and DynomoDB needed by bar.html

    Details

    • Type: New Feature
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      1
    • Sprint:
      Fall 7 Dec 14 - Dec 23, Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5, Winter 6 Mar 8 - Mar 19, Spring 1 2021 Mar 22 - Apr 2, Spring 2 2021 Apr 5 - Apr 16

      Description

      The CGI script "geneIdLookup.py" in cgi-bin of the bioviz code base connects to a small dynamodb table during operation of bar.html code.

      Add provisioning and configuration of the dynamodb resource to the bioviz playbooks.

        Attachments

          Activity

          Hide
          ann.loraine Ann Loraine added a comment -

          A role "Bioviz" needs to be created that has AmazonDynamoDBReadOnlyAccess policy.

          Show
          ann.loraine Ann Loraine added a comment - A role "Bioviz" needs to be created that has AmazonDynamoDBReadOnlyAccess policy.
          Hide
          ann.loraine Ann Loraine added a comment -

          A table named "Araport11" needs to be created and populated. FYI: It will likely never change and it is very small.

          Show
          ann.loraine Ann Loraine added a comment - A table named "Araport11" needs to be created and populated. FYI: It will likely never change and it is very small.
          Hide
          ann.loraine Ann Loraine added a comment -
          Show
          ann.loraine Ann Loraine added a comment - Code used to create the table: https://bitbucket.org/lorainelab/genomesource/src/master/IGBF-1495/
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          Data used to populate the table: http://igbquickload.org/quickload/A_thaliana_Jun_2009/Araport11.bed.gz

          Data file is also version-controlled in a subversion repo: https://svn.bioviz.org/viewvc/genomes/quickload/

          Show
          ann.loraine Ann Loraine added a comment - - edited Data used to populate the table: http://igbquickload.org/quickload/A_thaliana_Jun_2009/Araport11.bed.gz Data file is also version-controlled in a subversion repo: https://svn.bioviz.org/viewvc/genomes/quickload/
          Hide
          ann.loraine Ann Loraine added a comment -

          To test whether the dynamoDB access is working, just hit this URL, substituting your bioviz hostname:

          Show
          ann.loraine Ann Loraine added a comment - To test whether the dynamoDB access is working, just hit this URL, substituting your bioviz hostname: https://www.bioviz.org/cgi-bin/geneIdLookup.py
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          [~aloraine] This is the command to create a json of all the content of dynamo db
          aws dynamodb scan --table-name TABLE_NAME > export.json --region us-east-1

          The above command has to be run from an ec2 server with a role to download the content of dynamo DB
          Please share that file with me.

          Show
          cdias1 Chester Dias (Inactive) added a comment - [~aloraine] This is the command to create a json of all the content of dynamo db aws dynamodb scan --table-name TABLE_NAME > export.json --region us-east-1 The above command has to be run from an ec2 server with a role to download the content of dynamo DB Please share that file with me.
          Hide
          ann.loraine Ann Loraine added a comment -

          Data are available (temporarily) at the following URL:

          Show
          ann.loraine Ann Loraine added a comment - Data are available (temporarily) at the following URL: https://www.bioviz.org/export.json
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          CLI doesnt allow more than 25 items to be loaded into the table
          I have started using the .py scripts mentioned earlier to try and integrate that with the playbooks

          Show
          cdias1 Chester Dias (Inactive) added a comment - CLI doesnt allow more than 25 items to be loaded into the table I have started using the .py scripts mentioned earlier to try and integrate that with the playbooks
          Hide
          ann.loraine Ann Loraine added a comment -

          Thanks for the update!

          Show
          ann.loraine Ann Loraine added a comment - Thanks for the update!
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          Process
          1. Grant dynamo DB admin role priv to ec2 to provision infrastructure
          2. Create dynamo db table if not present
          3. Copy data and python script to EC2
          4. Load data using python script if the data is not present
          5. Remove admin from ec2 role and grant read-only access level to ec2

          Please review: https://bitbucket.org/chesterdias/appstore-playbooks/branch/IGBF-2686#diff

          Show
          cdias1 Chester Dias (Inactive) added a comment - Process 1. Grant dynamo DB admin role priv to ec2 to provision infrastructure 2. Create dynamo db table if not present 3. Copy data and python script to EC2 4. Load data using python script if the data is not present 5. Remove admin from ec2 role and grant read-only access level to ec2 Please review: https://bitbucket.org/chesterdias/appstore-playbooks/branch/IGBF-2686#diff
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          Chester Dias - Please confirm that the playbook "setup.yml" can run from start to finish in a non-lorainelab account.

          Show
          ann.loraine Ann Loraine added a comment - - edited Chester Dias - Please confirm that the playbook "setup.yml" can run from start to finish in a non-lorainelab account.
          Hide
          ann.loraine Ann Loraine added a comment -

          Please also check that the site works, as well – main one to check is that the bar.js and bar.html functionality is working properly, since these are the parts of the site that are using the dynamodb.

          Show
          ann.loraine Ann Loraine added a comment - Please also check that the site works, as well – main one to check is that the bar.js and bar.html functionality is working properly, since these are the parts of the site that are using the dynamodb.
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          To test, please team up with another developer for a zoom session. Working together, walk through the key parts of the site to ensure they are working correctly.
          To know how to test bar.hmtl and bar.js, read the relevant sections of this paper which describes the functionality: https://pubmed.ncbi.nlm.nih.gov/31350781/

          To test with the efp-browser mentioned in the paper, copy the urls for "view in IGB" show in the efp-browser. Enter these links into a web browser, replacing them with the address of your bioviz mirror site. You may also need to modify your local /etc/hosts file to "trick" your browser into thinking that the your bioviz mirror site's "hostname" resolves to its public IP address. (This may be necessary if your bioviz mirror's hostname is not current register in DNS.)

          Show
          ann.loraine Ann Loraine added a comment - - edited To test, please team up with another developer for a zoom session. Working together, walk through the key parts of the site to ensure they are working correctly. To know how to test bar.hmtl and bar.js, read the relevant sections of this paper which describes the functionality: https://pubmed.ncbi.nlm.nih.gov/31350781/ To test with the efp-browser mentioned in the paper, copy the urls for "view in IGB" show in the efp-browser. Enter these links into a web browser, replacing them with the address of your bioviz mirror site. You may also need to modify your local /etc/hosts file to "trick" your browser into thinking that the your bioviz mirror site's "hostname" resolves to its public IP address. (This may be necessary if your bioviz mirror's hostname is not current register in DNS.)
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          [~aloraine] I have noticed the data load takes more than 12 minutes with the whole data using the py script. Would you like me to see if we can improve that script? It gives a feeling that the ansible session has got stuck.

          Show
          cdias1 Chester Dias (Inactive) added a comment - [~aloraine] I have noticed the data load takes more than 12 minutes with the whole data using the py script. Would you like me to see if we can improve that script? It gives a feeling that the ansible session has got stuck.
          Hide
          ann.loraine Ann Loraine added a comment -

          Sure, I would say that counts as a bug that ought to be addressed. The table itself is tiny, so it does not make sense that it takes so long.

          Show
          ann.loraine Ann Loraine added a comment - Sure, I would say that counts as a bug that ought to be addressed. The table itself is tiny, so it does not make sense that it takes so long.
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          I was able to get the configuration modified to increase the write speed. [~aloraine] Could you please let me know how often we would be writing to this table. There appears to be a monthly charge associated to the write capacity units. If we are not using we can dial down that value after populating all the data

          Show
          cdias1 Chester Dias (Inactive) added a comment - I was able to get the configuration modified to increase the write speed. [~aloraine] Could you please let me know how often we would be writing to this table. There appears to be a monthly charge associated to the write capacity units. If we are not using we can dial down that value after populating all the data
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          Speed of Read/ write is controlled by below
          Provisioned read capacity units 5 (Auto Scaling Disabled)
          Provisioned write capacity units 50 (Auto Scaling Disabled)

          Overall Speed as per specification of aws is that A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests.

          Reference: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html

          Show
          cdias1 Chester Dias (Inactive) added a comment - Speed of Read/ write is controlled by below Provisioned read capacity units 5 (Auto Scaling Disabled) Provisioned write capacity units 50 (Auto Scaling Disabled) Overall Speed as per specification of aws is that A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          [~aloraine] Could you please review the write capacity units of https://console.aws.amazon.com/dynamodb/home?region=us-east-1#tables:selected=Araport11;tab=capacity

          it is set to 20(10$ a month charge) at the moment but if we are not using writes then we can dial this down to 5 which is the default.

          Show
          cdias1 Chester Dias (Inactive) added a comment - [~aloraine] Could you please review the write capacity units of https://console.aws.amazon.com/dynamodb/home?region=us-east-1#tables:selected=Araport11;tab=capacity it is set to 20(10$ a month charge) at the moment but if we are not using writes then we can dial this down to 5 which is the default.
          Hide
          ann.loraine Ann Loraine added a comment -

          Thank you. I have set the write capacity to 1. Is that OK?

          Show
          ann.loraine Ann Loraine added a comment - Thank you. I have set the write capacity to 1. Is that OK?
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          Should be fine to set it to 5. As long as we are not writing. There could be some internal meta writes

          Show
          cdias1 Chester Dias (Inactive) added a comment - Should be fine to set it to 5. As long as we are not writing. There could be some internal meta writes
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          Review: https://bitbucket.org/chesterdias/appstore-playbooks/commits/2aa3297973d9671b27c15ae06c9ae56687bf2dab?at=IGBF-2686

          Chirag Chandrahas Shetty Please try running this from your system as a fresh run E2E, as Dr. Loraine suggested earlier before the holiday break. I would get an idea then of what more needs to be done to simplify this and improve where we can, based on your execution difficulties. I will remain available during the next week so message me any time about any issues you come across.

          Show
          cdias1 Chester Dias (Inactive) added a comment - Review: https://bitbucket.org/chesterdias/appstore-playbooks/commits/2aa3297973d9671b27c15ae06c9ae56687bf2dab?at=IGBF-2686 Chirag Chandrahas Shetty Please try running this from your system as a fresh run E2E, as Dr. Loraine suggested earlier before the holiday break. I would get an idea then of what more needs to be done to simplify this and improve where we can, based on your execution difficulties. I will remain available during the next week so message me any time about any issues you come across.
          Hide
          ann.loraine Ann Loraine added a comment -

          Requests:

          Show
          ann.loraine Ann Loraine added a comment - Requests: Please modify the loading strategy to use the original BED file and not a JSON file to load the data; please see scripts written by Charan V. ( https://bitbucket.org/lorainelab/genomesource/src/master/IGBF-1495/ ) Use an ansible svn module to obtain the BED file by "checking out" a read-only single directory where the bed file resides. Do NOT check out the entire repository as it is huge. To support the above, add new default configurations to (a) specify the specific svn directory to be checked-out (b) specify the checked out bed file to be used (Araport11.bed.gz). Use the current convention of specifying defaults by adding these to the roles "defaults" directory (for example, see https://bitbucket.org/lorainelab/bioviz-playbooks/src/master/roles/clone/defaults/main.yml )
          Show
          cdias1 Chester Dias (Inactive) added a comment - Please review: https://bitbucket.org/chesterdias/appstore-playbooks/commits/55b6824dbec009385a0eb4ef7df0cdac54f13c0a
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          Looks like the new code is referring to the appstore playbooks. The appstore actually does not need any dynamodb code. Please get the latest copy master branch of the bioviz-playbooks repository and transfer the new code over to there.

          I'm in the middle of restructuring the bioviz-playbooks and will push my new code later today or Monday. Wait for that to happen before creating your new bioviz-playbooks branch.

          Show
          ann.loraine Ann Loraine added a comment - - edited Looks like the new code is referring to the appstore playbooks. The appstore actually does not need any dynamodb code. Please get the latest copy master branch of the bioviz-playbooks repository and transfer the new code over to there. I'm in the middle of restructuring the bioviz-playbooks and will push my new code later today or Monday. Wait for that to happen before creating your new bioviz-playbooks branch.
          Show
          cdias1 Chester Dias (Inactive) added a comment - Please review : https://bitbucket.org/chesterdias/bioviz-playbooks-chester-local/branch/IGBF-2686#diff
          Hide
          ann.loraine Ann Loraine added a comment -

          question about:

                 if not request:
          +            request.append({
          +                'Chromosome': Chromosome,
          +                'Start': Start,
          +                'End': End,
          +                'GeneId': GeneId,
          +            })
          

          in https://bitbucket.org/chesterdias/bioviz-playbooks-chester-local/branch/IGBF-2686#chg-roles/dynamodb/templates/load_svn_data.py.j2

          Show
          ann.loraine Ann Loraine added a comment - question about: if not request: + request.append({ + 'Chromosome': Chromosome, + 'Start': Start, + 'End': End, + 'GeneId': GeneId, + }) in https://bitbucket.org/chesterdias/bioviz-playbooks-chester-local/branch/IGBF-2686#chg-roles/dynamodb/templates/load_svn_data.py.j2
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          That came from https://bitbucket.org/lorainelab/genomesource/src/master/IGBF-1495/createTable.py

          if not data:
                      data.append({
                          'Chromosome': Chromosome,
                          'Start': Start,
                          'End': End,
                          'GeneId': GeneId,
                      })

          My understanding is that it was possibly done to prevent adding duplicate entries with line 28 to 53 in my code since for execution can not occur in when the request is empty

          Show
          cdias1 Chester Dias (Inactive) added a comment - That came from https://bitbucket.org/lorainelab/genomesource/src/master/IGBF-1495/createTable.py if not data: data.append({ 'Chromosome': Chromosome, 'Start': Start, 'End': End, 'GeneId': GeneId, }) My understanding is that it was possibly done to prevent adding duplicate entries with line 28 to 53 in my code since for execution can not occur in when the request is empty
          Hide
          ann.loraine Ann Loraine added a comment -

          Please investigate: What happens if "data" (or "request") is "None"?

          Show
          ann.loraine Ann Loraine added a comment - Please investigate: What happens if "data" (or "request") is "None"?
          Hide
          cdias1 Chester Dias (Inactive) added a comment -

          data is declared as an empty array on line 9. Since there is no first element in `data` for the first execution, running a for loop will throw an error. The IF part of the logic is only to handle first element. All the remaining executions will go through the else part of the logic running the for loop.

          Show
          cdias1 Chester Dias (Inactive) added a comment - data is declared as an empty array on line 9. Since there is no first element in `data` for the first execution, running a for loop will throw an error. The IF part of the logic is only to handle first element. All the remaining executions will go through the else part of the logic running the for loop.
          Hide
          ann.loraine Ann Loraine added a comment -

          OK I understand. Please modify the "if" statement to literally test the size of the array instead of relying on a quirk of the language in which an empty array evaluates to False in "if" statements.

          Show
          ann.loraine Ann Loraine added a comment - OK I understand. Please modify the "if" statement to literally test the size of the array instead of relying on a quirk of the language in which an empty array evaluates to False in "if" statements.
          Show
          cdias1 Chester Dias (Inactive) added a comment - Please review : https://bitbucket.org/chesterdias/bioviz-playbooks-chester-local/branch/IGBF-2686#chg-roles/dynamodb/templates/load_svn_data.py.j2
          Show
          cdias1 Chester Dias (Inactive) added a comment - https://bitbucket.org/lorainelab/bioviz-playbooks/pull-requests/9/igbf-2686
          Hide
          ann.loraine Ann Loraine added a comment -

          Merged.

          Show
          ann.loraine Ann Loraine added a comment - Merged.
          Hide
          ann.loraine Ann Loraine added a comment -

          Making some changes / updates to the code:

          • Define dynamo_db_table_name to be Araport11
          • Add role to aws.yml
          Show
          ann.loraine Ann Loraine added a comment - Making some changes / updates to the code: Define dynamo_db_table_name to be Araport11 Add role to aws.yml
          Hide
          ann.loraine Ann Loraine added a comment -

          The loading script appears to search through the entire "results" list each time it reads a line. From start to finish.
          Fixing this.

          Show
          ann.loraine Ann Loraine added a comment - The loading script appears to search through the entire "results" list each time it reads a line. From start to finish. Fixing this.
          Hide
          ann.loraine Ann Loraine added a comment -

          Also will enable AWS and other variables to be passed as command line options. This will enable the script to be run with no editing required.

          Show
          ann.loraine Ann Loraine added a comment - Also will enable AWS and other variables to be passed as command line options. This will enable the script to be run with no editing required.
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          How to get a single file out of a subversion repository without checking out an entire directory:

          svn export https://svn.bioviz.org/repos/genomes/quickload/A_thaliana_Jun_2009/Araport11.bed.gz --username=guest --password=guest 
          
          Show
          ann.loraine Ann Loraine added a comment - - edited How to get a single file out of a subversion repository without checking out an entire directory: svn export https: //svn.bioviz.org/repos/genomes/quickload/A_thaliana_Jun_2009/Araport11.bed.gz --username=guest --password=guest

            People

            • Assignee:
              ann.loraine Ann Loraine
              Reporter:
              ann.loraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: