Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3571

Identify folders and files on HPC for removal

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Trivial
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Since we have consumed 83TB of storage space on the cluster, it is likely that many things can be removed to free up space.

      We have done many runs on some datasets and we only really care about the runs that lead to IDB deposition and are on data that is POST SRA submission.

      The goal of this ticket is to identify candidate folders that may be suitable for removal.

      We will make a list of these folders. We will then have them reviewed by Ann, Molly and Rob to ensure nothing of importance get removed!

      The list can be posted in the comments, with the exact cluster location.

        Attachments

          Activity

          Hide
          Mdavis4290 Molly Davis added a comment -

          Everything makes sense and is a good place to end the ticket.

          Show
          Mdavis4290 Molly Davis added a comment - Everything makes sense and is a good place to end the ticket.
          Hide
          robofjoy Robert Reid added a comment - - edited

          For now, we have plenty of space available.

          13T.

          /projects/tomato_genome 90T 78T 13T 86% aqu-fs19 171T 123T 49T 72%

          We can revisit later when we consume this space.

          For review:
          Someone should read over these notes and ensure it makes sense. And then we close the ticket.

          Show
          robofjoy Robert Reid added a comment - - edited For now, we have plenty of space available. 13T. /projects/tomato_genome 90T 78T 13T 86% aqu-fs19 171T 123T 49T 72% We can revisit later when we consume this space. For review: Someone should read over these notes and ensure it makes sense. And then we close the ticket.
          Hide
          robofjoy Robert Reid added a comment - - edited

          Chopping out an old analysis that I did.

          sra44-apr2020

          540K checkbams
          48K lightHarvestingComplex
          92M megamega
          8.1T rnaseq-phase1
          6.6T sra44-apr2020
          4.0K sra44-sra22-pluscounts.txt
          28M timecluster

          We have since ran this one datasets many other ways, and it not needed. Free up 6.6
          TB !!
          And it is 4 years old.

          Show
          robofjoy Robert Reid added a comment - - edited Chopping out an old analysis that I did. sra44-apr2020 540K checkbams 48K lightHarvestingComplex 92M megamega 8.1T rnaseq-phase1 6.6T sra44-apr2020 4.0K sra44-sra22-pluscounts.txt 28M timecluster We have since ran this one datasets many other ways, and it not needed. Free up 6.6 TB !! And it is 4 years old.
          Hide
          robofjoy Robert Reid added a comment -

          Of the 80T, this is a breakdown of the folders:

          8.3T alt_splicing
          294G annswork
          51T dataprocessing
          2.5G db
          3.3T genome
          1.2T nfcore_rnaseq
          1.5T rnaseq-phase2-areualive
          15T robswork
          9.9G scripts
          0 ZZZ_pending_deletion

          Show
          robofjoy Robert Reid added a comment - Of the 80T, this is a breakdown of the folders: 8.3T alt_splicing 294G annswork 51T dataprocessing 2.5G db 3.3T genome 1.2T nfcore_rnaseq 1.5T rnaseq-phase2-areualive 15T robswork 9.9G scripts 0 ZZZ_pending_deletion
          Hide
          robofjoy Robert Reid added a comment -

          The top folders:

          /projects/tomato_genome/rnaseq
          3.4T

          /projects/tomato_genome/SV100
          0

          /projects/tomato_genome/genome$ du -sh
          71G

          /projects/tomato_genome/db$ du -sh
          718G

          /projects/tomato_genome/sw$ du -sh
          298M

          /projects/tomato_genome/scripts$ du -sh
          4.9G

          /projects/tomato_genome/fnb
          80T

          So almost all of the the data is in fnb as expected.
          Let's ignore the other folders and focus on just that.

          Show
          robofjoy Robert Reid added a comment - The top folders: /projects/tomato_genome/rnaseq 3.4T /projects/tomato_genome/SV100 0 /projects/tomato_genome/genome$ du -sh 71G /projects/tomato_genome/db$ du -sh 718G /projects/tomato_genome/sw$ du -sh 298M /projects/tomato_genome/scripts$ du -sh 4.9G /projects/tomato_genome/fnb 80T So almost all of the the data is in fnb as expected. Let's ignore the other folders and focus on just that.
          Hide
          robofjoy Robert Reid added a comment -

          A quick glance at the folders in
          /projects/tomato_genome$

          du -sh

          84T

          There are many files I don't have access to still. Likely these are Ann controlled folders.

          Next, need to get size of each folder to see where most of our data is.

          Show
          robofjoy Robert Reid added a comment - A quick glance at the folders in /projects/tomato_genome$ du -sh 84T There are many files I don't have access to still. Likely these are Ann controlled folders. Next, need to get size of each folder to see where most of our data is.
          Hide
          Mdavis4290 Molly Davis added a comment - - edited

          Used 'chmod -R g+w *' command on all of the 'data processing' directories that I have permissions too.

          Show
          Mdavis4290 Molly Davis added a comment - - edited Used 'chmod -R g+w *' command on all of the 'data processing' directories that I have permissions too.
          Hide
          robofjoy Robert Reid added a comment -

          Some files I cannot see due to permissions.

          .fnb/dataprocessing/*

          I am guessing that these are Molly files and folders!
          Can we get those permissions changed?

          Outside of those folders,

          we have 56,698 files in /projects/tomato_genome. !!!!

          Show
          robofjoy Robert Reid added a comment - Some files I cannot see due to permissions. .fnb/dataprocessing/* I am guessing that these are Molly files and folders! Can we get those permissions changed? Outside of those folders, we have 56,698 files in /projects/tomato_genome. !!!!

            People

            • Assignee:
              robofjoy Robert Reid
              Reporter:
              robofjoy Robert Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: