Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3201

Investigate why Jira goes down with a 503

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: Jira keeps going down with a 503.

      Task: Determine why Jira is going down.

      Note:

      To restart the Jira server:

      • log into the jira host using ssh as user "ec2-user"
      • change to root user (sudo su)
      • change to the "bin" directory, located here: /home/ec2-user/jira/atlassian-jira-software-7.0.11-standalone/bin
      • let the server shut down properly, just in case it is still running, by executing the "stop jira" script: stop-jira.sh

      Note that if the server has crashed, probably there is still a "stale" PID file. If so, when you attempt to stop jira, the script will print that there is a stale PID file.

      • Restart jira by running the "start jira" script: start-jira.sh

      To enable Nowlan Freese to carry out the above workflow I did this:

      • Added his public key to the authorized hosts on jira.bioviz.org
      • Modifed the EC2 and its attached security group "jira3" to enable him to restart the the EC2 and also modify its security group
      • Used the IAM policy simulator on AWS to check if his user name can edit the security group, which confirmed that he can do it.

        Attachments

          Activity

          nfreese Nowlan Freese created issue -
          nfreese Nowlan Freese made changes -
          Field Original Value New Value
          Epic Link IGBF-2323 [ 18477 ]
          ann.loraine Ann Loraine made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          Upon casual inspection, I did not determine why the site crashed. However, I did notice that our support license had expired. I requested a new license for "Jira software" using the Atlassian Web site. The order number is AT-205014171. Clicking on my account page at Atlassian revealed the new license, which I entered into the Jira administration page. I had to make a second order for the "Jira core". That order number was AT-205014270. I obtained the new licenses and added them to the site using an "admin" screen.

          Show
          ann.loraine Ann Loraine added a comment - - edited Upon casual inspection, I did not determine why the site crashed. However, I did notice that our support license had expired. I requested a new license for "Jira software" using the Atlassian Web site. The order number is AT-205014171. Clicking on my account page at Atlassian revealed the new license, which I entered into the Jira administration page. I had to make a second order for the "Jira core". That order number was AT-205014270. I obtained the new licenses and added them to the site using an "admin" screen.
          Hide
          ann.loraine Ann Loraine added a comment -

          Licenses are now up-to-date until Oct 2023.

          Show
          ann.loraine Ann Loraine added a comment - Licenses are now up-to-date until Oct 2023.
          Hide
          ann.loraine Ann Loraine added a comment -

          Found this page with common causes for jira tomcat crashing:

          https://confluence.atlassian.com/jirakb/common-causes-for-jira-server-crashes-and-performance-issues-203394749.html

          When I logged onto the server after it crashed, I observed there was a "stale" process id for tomcat. Restarting the server was easy using the provided script, however. The presence of the stale PID file did not affect my attempt to restart the Jira/tomcat process.

          Show
          ann.loraine Ann Loraine added a comment - Found this page with common causes for jira tomcat crashing: https://confluence.atlassian.com/jirakb/common-causes-for-jira-server-crashes-and-performance-issues-203394749.html When I logged onto the server after it crashed, I observed there was a "stale" process id for tomcat. Restarting the server was easy using the provided script, however. The presence of the stale PID file did not affect my attempt to restart the Jira/tomcat process.
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          Page on "java crashes" recommends looking for java crash logs in the "bin" directory of Jira software.
          On our current system setup, this "bin" directory is here:

          • /home/ec2-user/jira/atlassian-jira-software-7.0.11-standalone/bin

          There are three such java crash log files:

          /home/ec2-user/jira/atlassian-jira-software-7.0.11-standalone/bin
          jira.bioviz.org ec2-user $ ls -lh *.log
          -rw-r--r-- 1 root root 531K Oct 16 03:58 hs_err_pid23961.log
          -rw-r--r-- 1 root root  20K Feb  2  2022 hs_err_pid28078.log
          -rw-r--r-- 1 root root 544K Oct 14 01:05 hs_err_pid2960.log
          

          The top part of each file:

          jira.bioviz.org ec2-user $ head *.log
          ==> hs_err_pid23961.log <==
          #
          # There is insufficient memory for the Java Runtime Environment to continue.
          # Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.
          # Possible reasons:
          #   The system is out of physical RAM or swap space
          #   The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap
          # Possible solutions:
          #   Reduce memory load on the system
          #   Increase physical memory or swap space
          #   Check if swap backing store is full
          
          ==> hs_err_pid28078.log <==
          #
          # There is insufficient memory for the Java Runtime Environment to continue.
          # Native memory allocation (mmap) failed to map 268435456 bytes for committing reserved memory.
          # Possible reasons:
          #   The system is out of physical RAM or swap space
          #   The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap
          # Possible solutions:
          #   Reduce memory load on the system
          #   Increase physical memory or swap space
          #   Check if swap backing store is full
          
          ==> hs_err_pid2960.log <==
          #
          # There is insufficient memory for the Java Runtime Environment to continue.
          # Native memory allocation (mmap) failed to map 24641536 bytes for committing reserved memory.
          # Possible reasons:
          #   The system is out of physical RAM or swap space
          #   The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap
          # Possible solutions:
          #   Reduce memory load on the system
          #   Increase physical memory or swap space
          #   Check if swap backing store is full
          
          Show
          ann.loraine Ann Loraine added a comment - - edited Page on "java crashes" recommends looking for java crash logs in the "bin" directory of Jira software. On our current system setup, this "bin" directory is here: /home/ec2-user/jira/atlassian-jira-software-7.0.11-standalone/bin There are three such java crash log files: /home/ec2-user/jira/atlassian-jira-software-7.0.11-standalone/bin jira.bioviz.org ec2-user $ ls -lh *.log -rw-r--r-- 1 root root 531K Oct 16 03:58 hs_err_pid23961.log -rw-r--r-- 1 root root 20K Feb 2 2022 hs_err_pid28078.log -rw-r--r-- 1 root root 544K Oct 14 01:05 hs_err_pid2960.log The top part of each file: jira.bioviz.org ec2-user $ head *.log ==> hs_err_pid23961.log <== # # There is insufficient memory for the Java Runtime Environment to continue . # Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full ==> hs_err_pid28078.log <== # # There is insufficient memory for the Java Runtime Environment to continue . # Native memory allocation (mmap) failed to map 268435456 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full ==> hs_err_pid2960.log <== # # There is insufficient memory for the Java Runtime Environment to continue . # Native memory allocation (mmap) failed to map 24641536 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full
          Hide
          ann.loraine Ann Loraine added a comment -

          The two crashes last week happened because java ran out of memory.

          Show
          ann.loraine Ann Loraine added a comment - The two crashes last week happened because java ran out of memory.
          ann.loraine Ann Loraine made changes -
          Description Situation: Jira keeps going down with a 503.

          Task: Determine why Jira is going down.
          Situation: Jira keeps going down with a 503.

          Task: Determine why Jira is going down.

          Note:

          To restart the Jira server:

          * log into the jira host using ssh as user "ec2-user"
          * change to root user (sudo su)
          * change to the "bin" directory, located here: /home/ec2-user/jira/atlassian-jira-software-7.0.11-standalone/bin
          * let the server shut down properly, just in case it is still running, by executing the "stop jira" script: stop-jira.sh

          Note that if the server has crashed, probably there is still a "stale" PID file. If so, when you attempt to stop jira, the script will print that there is a stale PID file.

          * Restart jira by running the "start jira" script: start-jira.sh

          ann.loraine Ann Loraine made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          ann.loraine Ann Loraine made changes -
          Assignee Ann Loraine [ aloraine ]
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          [~aloraine] to add NF public key to host. Also, enable NF user to modify security group for the host.

          Show
          ann.loraine Ann Loraine added a comment - - edited [~aloraine] to add NF public key to host. Also, enable NF user to modify security group for the host.
          ann.loraine Ann Loraine made changes -
          Assignee Ann Loraine [ aloraine ]
          ann.loraine Ann Loraine made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          ann.loraine Ann Loraine made changes -
          Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
          ann.loraine Ann Loraine made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          ann.loraine Ann Loraine made changes -
          Description Situation: Jira keeps going down with a 503.

          Task: Determine why Jira is going down.

          Note:

          To restart the Jira server:

          * log into the jira host using ssh as user "ec2-user"
          * change to root user (sudo su)
          * change to the "bin" directory, located here: /home/ec2-user/jira/atlassian-jira-software-7.0.11-standalone/bin
          * let the server shut down properly, just in case it is still running, by executing the "stop jira" script: stop-jira.sh

          Note that if the server has crashed, probably there is still a "stale" PID file. If so, when you attempt to stop jira, the script will print that there is a stale PID file.

          * Restart jira by running the "start jira" script: start-jira.sh

          Situation: Jira keeps going down with a 503.

          Task: Determine why Jira is going down.

          Note:

          To restart the Jira server:

          * log into the jira host using ssh as user "ec2-user"
          * change to root user (sudo su)
          * change to the "bin" directory, located here: /home/ec2-user/jira/atlassian-jira-software-7.0.11-standalone/bin
          * let the server shut down properly, just in case it is still running, by executing the "stop jira" script: stop-jira.sh

          Note that if the server has crashed, probably there is still a "stale" PID file. If so, when you attempt to stop jira, the script will print that there is a stale PID file.

          * Restart jira by running the "start jira" script: start-jira.sh

          To enable [~nfreese] to carry out the above workflow I did this:

          * Added his public key to the authorized hosts on jira.bioviz.org
          * Modifed the EC2 and its attached security group "jira3" to enable him to restart the the EC2 and also modify its security group
          * Used the IAM policy simulator on AWS to check if his user name can edit the security group, which confirmed that he can do it.

          ann.loraine Ann Loraine made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          ann.loraine Ann Loraine made changes -
          Assignee Ann Loraine [ aloraine ]
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 4 2022 Oct 10 [ 156 ] Fall 4 2022 Oct 10, Fall 5 2022 Oct 24 [ 156, 157 ]
          nfreese Nowlan Freese made changes -
          Rank Ranked higher
          Hide
          nfreese Nowlan Freese added a comment -

          I am able to modify the security group for the jira3 EC2 and I am able to ssh onto the server.

          Closing ticket.

          Show
          nfreese Nowlan Freese added a comment - I am able to modify the security group for the jira3 EC2 and I am able to ssh onto the server. Closing ticket.
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Ann Loraine [ aloraine ]
          nfreese Nowlan Freese made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          nfreese Nowlan Freese made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          nfreese Nowlan Freese made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          nfreese Nowlan Freese made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          nfreese Nowlan Freese made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          nfreese Nowlan Freese made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          nfreese Nowlan Freese made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
          ann.loraine Ann Loraine made changes -
          Resolution Done [ 10000 ]
          Status Closed [ 6 ] To-Do [ 10305 ]
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          Having problems again with the jira host. Error message from hs_err_pid2261.log:

          1. There is insufficient memory for the Java Runtime Environment to continue.
          2. Native memory allocation (mmap) failed to map 7340032 bytes for committing res
            erved memory.
          3. Possible reasons:
          4. The system is out of physical RAM or swap space
          5. The process is running with CompressedOops enabled, and the Java Heap may be
            blocking the growth of the native heap

          Host has little physical memory left:

          jira.bioviz.org ec2-user $ df -h
          Filesystem      Size  Used Avail Use% Mounted on
          devtmpfs        2.5G     0  2.5G   0% /dev
          tmpfs           2.5G     0  2.5G   0% /dev/shm
          tmpfs           2.5G   41M  2.4G   2% /run
          tmpfs           2.5G     0  2.5G   0% /sys/fs/cgroup
          /dev/nvme0n1p1  100G   96G  4.3G  96% /
          tmpfs           497M     0  497M   0% /run/user/1000
          
          Show
          ann.loraine Ann Loraine added a comment - - edited Having problems again with the jira host. Error message from hs_err_pid2261.log: There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (mmap) failed to map 7340032 bytes for committing res erved memory. Possible reasons: The system is out of physical RAM or swap space The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap Host has little physical memory left: jira.bioviz.org ec2-user $ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 2.5G 0 2.5G 0% /dev tmpfs 2.5G 0 2.5G 0% /dev/shm tmpfs 2.5G 41M 2.4G 2% /run tmpfs 2.5G 0 2.5G 0% /sys/fs/cgroup /dev/nvme0n1p1 100G 96G 4.3G 96% / tmpfs 497M 0 497M 0% /run/user/1000
          ann.loraine Ann Loraine made changes -
          Sprint Fall 4 2022 Oct 10, Fall 5 2022 Oct 24 [ 156, 157 ] Fall 4 2022 Oct 10, Fall 5 2022 Oct 24, Spring 6 2023 Mar 20 [ 156, 157, 166 ]
          Hide
          ann.loraine Ann Loraine added a comment -

          Modifying volume. Increasing to 150 Gb up from 100 Gb.
          Need to extend Linux file system after resizing according to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-modify-volume.html

          Show
          ann.loraine Ann Loraine added a comment - Modifying volume. Increasing to 150 Gb up from 100 Gb. Need to extend Linux file system after resizing according to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-modify-volume.html
          Hide
          ann.loraine Ann Loraine added a comment -

          Increased disk space and rebooted EC2. New space is available:

          jira.bioviz.org ec2-user $ df -h
          Filesystem      Size  Used Avail Use% Mounted on
          devtmpfs        2.5G     0  2.5G   0% /dev
          tmpfs           2.5G     0  2.5G   0% /dev/shm
          tmpfs           2.5G  400K  2.5G   1% /run
          tmpfs           2.5G     0  2.5G   0% /sys/fs/cgroup
          /dev/nvme0n1p1  150G   95G   56G  64% /
          tmpfs           497M     0  497M   0% /run/user/1000
          

          Checked the the httpd server was running. It was not. Started it and then started jira and confluence manually using vendor-provided startup scripts.

          Now checking backup S3 bucket to see if backups stop being made.

          They were indeed made. Deleted stale backups.

          Show
          ann.loraine Ann Loraine added a comment - Increased disk space and rebooted EC2. New space is available: jira.bioviz.org ec2-user $ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 2.5G 0 2.5G 0% /dev tmpfs 2.5G 0 2.5G 0% /dev/shm tmpfs 2.5G 400K 2.5G 1% /run tmpfs 2.5G 0 2.5G 0% /sys/fs/cgroup /dev/nvme0n1p1 150G 95G 56G 64% / tmpfs 497M 0 497M 0% /run/user/1000 Checked the the httpd server was running. It was not. Started it and then started jira and confluence manually using vendor-provided startup scripts. Now checking backup S3 bucket to see if backups stop being made. They were indeed made. Deleted stale backups.
          Hide
          ann.loraine Ann Loraine added a comment -

          Jira and confluence (https://wiki.bioviz.org/confluence/display/igbman) are both back up. Moving to closed.

          Show
          ann.loraine Ann Loraine added a comment - Jira and confluence ( https://wiki.bioviz.org/confluence/display/igbman ) are both back up. Moving to closed.
          ann.loraine Ann Loraine made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          ann.loraine Ann Loraine made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          ann.loraine Ann Loraine made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          ann.loraine Ann Loraine made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          ann.loraine Ann Loraine made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          ann.loraine Ann Loraine made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          ann.loraine Ann Loraine made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          ann.loraine Ann Loraine made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          ann.loraine Ann Loraine made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

            People

            • Assignee:
              ann.loraine Ann Loraine
              Reporter:
              nfreese Nowlan Freese
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: