As a sanity check, I tested an unsorted and scrambled sam file to see how samtools sort would behave.
The starting sam file (unsorted.sam) is:
@SQ SN:chr3 LN:198022430
@SQ SN:chr15 LN:102531392
@SQ SN:chr1 LN:249250621
SOLEXA-1GA-2_2_FC20EMB:5:19:384:574 0 chr1 540627 25 36M * 0 0 AAGGGGCGGCCTGTGGCGTTTTCCTGTAAAGTTGGG hhThhhVa\XOh_JVFDLLQZGCEDFE?B?DGAA>C NM:i:0 X0:i:1 MD:Z:36
SOLEXA-1GA-2_2_FC20EMB:5:3:835:634 0 chr1 159002 25 36M * 0 0 ACACAAACACACAAACACACACACACACAACCCCAA ^ThgYWLRQNNWNHLQGMDIHIFLFKGI?DHAEDA@ NM:i:1 X1:i:1 MD:Z:13A15A6
SOLEXA-1GA-2_2_FC20EMB:5:123:425:446 0 chr15 22809005 25 36M * 0 0 GTGCAGTCCACCAGCAAGGGAGCTGTGACGAGAGTA ]K\HLJGGFLGEIEB?FIBFEIDCC@>EA>@??>=? NM:i:1 X1:i:1 MD:Z:28C3A2A0
SOLEXA-1GA-2_2_FC20EMB:5:241:482:157 0 chr15 22691475 25 36M * 0 0 TCAGTGATAGAGCAAGAAAACAAAATGGGTTTCCTG hhhhhhhhhhbhafWeXW]TOLTHUTVTWHWUK@MC NM:i:0 X0:i:1 MD:Z:36
SOLEXA-1GA-2_2_FC20EMB:5:118:97:121 0 chr3 870368 25 36M * 0 0 TAGCATAGAGTGGTATAACACATTTAAGTATGAAAG hhhhhhhhhhhhhhhhchhhhOhhhYZhcc[TJWCS NM:i:1 X1:i:1 MD:Z:0T35
SOLEXA-1GA-2_2_FC20EMB:5:44:973:514 0 chr3 682558 25 36M * 0 0 CACACACATACACATACATACACACCAACCAACACA hhghhhhhhh]h`ehh\YdXd`haVQKPKJHMEGFF NM:i:1 X1:i:1 MD:Z:20C4C3C6
Note that the header is in the order chr3, chr15, chr1, the reads are in the order chr1, chr15, chr3, and the reads are not positionally sorted.
I ran the following commands (samtools version 1.9):
samtools view -bh unsorted.sam > unsorted.bam
samtools sort
samtools sort -o sorted.bam unsorted.bam
samtools index sorted.bam
samtools view -h sorted.bam > sorted.sam
The output sam file (sorted.sam) is:
@HD VN:1.6 SO:coordinate
@SQ SN:chr3 LN:198022430
@SQ SN:chr15 LN:102531392
@SQ SN:chr1 LN:249250621
SOLEXA-1GA-2_2_FC20EMB:5:44:973:514 0 chr3 682558 25 36M * 0 0 CACACACATACACATACATACACACCAACCAACACA hhghhhhhhh]h`ehh\YdXd`haVQKPKJHMEGFF NM:i:1 X1:i:1 MD:Z:20C4C3C6
SOLEXA-1GA-2_2_FC20EMB:5:118:97:121 0 chr3 870368 25 36M * 0 0 TAGCATAGAGTGGTATAACACATTTAAGTATGAAAG hhhhhhhhhhhhhhhhchhhhOhhhYZhcc[TJWCS NM:i:1 X1:i:1 MD:Z:0T35
SOLEXA-1GA-2_2_FC20EMB:5:241:482:157 0 chr15 22691475 25 36M * 0 0 TCAGTGATAGAGCAAGAAAACAAAATGGGTTTCCTG hhhhhhhhhhbhafWeXW]TOLTHUTVTWHWUK@MC NM:i:0 X0:i:1 MD:Z:36
SOLEXA-1GA-2_2_FC20EMB:5:123:425:446 0 chr15 22809005 25 36M * 0 0 GTGCAGTCCACCAGCAAGGGAGCTGTGACGAGAGTA ]K\HLJGGFLGEIEB?FIBFEIDCC@>EA>@??>=? NM:i:1 X1:i:1 MD:Z:28C3A2A0
SOLEXA-1GA-2_2_FC20EMB:5:3:835:634 0 chr1 159002 25 36M * 0 0 ACACAAACACACAAACACACACACACACAACCCCAA ^ThgYWLRQNNWNHLQGMDIHIFLFKGI?DHAEDA@ NM:i:1 X1:i:1 MD:Z:13A15A6
SOLEXA-1GA-2_2_FC20EMB:5:19:384:574 0 chr1 540627 25 36M * 0 0 AAGGGGCGGCCTGTGGCGTTTTCCTGTAAAGTTGGG hhThhhVa\XOh_JVFDLLQZGCEDFE?B?DGAA>C NM:i:0 X0:i:1 MD:Z:36
Note that the header remains in the order chr3, chr15, chr1, after sorting the reads appear in the order chr3, chr15, chr1 and are positionally sorted.
This provides further evidence that there is no default sort order for the chromosomes when using samtools. The header determines the chromosome order.
As a sanity check, I tested an unsorted and scrambled sam file to see how samtools sort would behave.
The starting sam file (unsorted.sam) is:
@SQ SN:chr3 LN:198022430
@SQ SN:chr15 LN:102531392
@SQ SN:chr1 LN:249250621
SOLEXA-1GA-2_2_FC20EMB:5:19:384:574 0 chr1 540627 25 36M * 0 0 AAGGGGCGGCCTGTGGCGTTTTCCTGTAAAGTTGGG hhThhhVa\XOh_JVFDLLQZGCEDFE?B?DGAA>C NM:i:0 X0:i:1 MD:Z:36
SOLEXA-1GA-2_2_FC20EMB:5:3:835:634 0 chr1 159002 25 36M * 0 0 ACACAAACACACAAACACACACACACACAACCCCAA ^ThgYWLRQNNWNHLQGMDIHIFLFKGI?DHAEDA@ NM:i:1 X1:i:1 MD:Z:13A15A6
SOLEXA-1GA-2_2_FC20EMB:5:123:425:446 0 chr15 22809005 25 36M * 0 0 GTGCAGTCCACCAGCAAGGGAGCTGTGACGAGAGTA ]K\HLJGGFLGEIEB?FIBFEIDCC@>EA>@??>=? NM:i:1 X1:i:1 MD:Z:28C3A2A0
SOLEXA-1GA-2_2_FC20EMB:5:241:482:157 0 chr15 22691475 25 36M * 0 0 TCAGTGATAGAGCAAGAAAACAAAATGGGTTTCCTG hhhhhhhhhhbhafWeXW]TOLTHUTVTWHWUK@MC NM:i:0 X0:i:1 MD:Z:36
SOLEXA-1GA-2_2_FC20EMB:5:118:97:121 0 chr3 870368 25 36M * 0 0 TAGCATAGAGTGGTATAACACATTTAAGTATGAAAG hhhhhhhhhhhhhhhhchhhhOhhhYZhcc[TJWCS NM:i:1 X1:i:1 MD:Z:0T35
SOLEXA-1GA-2_2_FC20EMB:5:44:973:514 0 chr3 682558 25 36M * 0 0 CACACACATACACATACATACACACCAACCAACACA hhghhhhhhh]h`ehh\YdXd`haVQKPKJHMEGFF NM:i:1 X1:i:1 MD:Z:20C4C3C6
Note that the header is in the order chr3, chr15, chr1, the reads are in the order chr1, chr15, chr3, and the reads are not positionally sorted.
I ran the following commands (samtools version 1.9):
samtools view -bh unsorted.sam > unsorted.bam
samtools sort
samtools sort -o sorted.bam unsorted.bam
samtools index sorted.bam
samtools view -h sorted.bam > sorted.sam
The output sam file (sorted.sam) is:
@HD VN:1.6 SO:coordinate
@SQ SN:chr3 LN:198022430
@SQ SN:chr15 LN:102531392
@SQ SN:chr1 LN:249250621
SOLEXA-1GA-2_2_FC20EMB:5:44:973:514 0 chr3 682558 25 36M * 0 0 CACACACATACACATACATACACACCAACCAACACA hhghhhhhhh]h`ehh\YdXd`haVQKPKJHMEGFF NM:i:1 X1:i:1 MD:Z:20C4C3C6
SOLEXA-1GA-2_2_FC20EMB:5:118:97:121 0 chr3 870368 25 36M * 0 0 TAGCATAGAGTGGTATAACACATTTAAGTATGAAAG hhhhhhhhhhhhhhhhchhhhOhhhYZhcc[TJWCS NM:i:1 X1:i:1 MD:Z:0T35
SOLEXA-1GA-2_2_FC20EMB:5:241:482:157 0 chr15 22691475 25 36M * 0 0 TCAGTGATAGAGCAAGAAAACAAAATGGGTTTCCTG hhhhhhhhhhbhafWeXW]TOLTHUTVTWHWUK@MC NM:i:0 X0:i:1 MD:Z:36
SOLEXA-1GA-2_2_FC20EMB:5:123:425:446 0 chr15 22809005 25 36M * 0 0 GTGCAGTCCACCAGCAAGGGAGCTGTGACGAGAGTA ]K\HLJGGFLGEIEB?FIBFEIDCC@>EA>@??>=? NM:i:1 X1:i:1 MD:Z:28C3A2A0
SOLEXA-1GA-2_2_FC20EMB:5:3:835:634 0 chr1 159002 25 36M * 0 0 ACACAAACACACAAACACACACACACACAACCCCAA ^ThgYWLRQNNWNHLQGMDIHIFLFKGI?DHAEDA@ NM:i:1 X1:i:1 MD:Z:13A15A6
SOLEXA-1GA-2_2_FC20EMB:5:19:384:574 0 chr1 540627 25 36M * 0 0 AAGGGGCGGCCTGTGGCGTTTTCCTGTAAAGTTGGG hhThhhVa\XOh_JVFDLLQZGCEDFE?B?DGAA>C NM:i:0 X0:i:1 MD:Z:36
Note that the header remains in the order chr3, chr15, chr1, after sorting the reads appear in the order chr3, chr15, chr1 and are positionally sorted.
This provides further evidence that there is no default sort order for the chromosomes when using samtools. The header determines the chromosome order.