I was just experimenting with the / tar and compress / gzip archives to automatically handle some of the backups I have.
My problem is this: I have various .tar files and .tar.gz files floating around, and so I want to extract the hash (md5) of the .tar.gz file and the hash (md5) .tar, ideally, in one pass.
The sample code that I have so far works great for hashes of files in .tar.gz as well as for .gz, but the hash for .tar is wrong, and I cannot find out what the problem is.
I looked at the tar / reader.go file and I saw that there are gaps there, but I thought that everything should work on the io.Reader interface, and therefore TeeReader should still catch all the bytes.
package main
import (
"archive/tar"
"compress/gzip"
"crypto/md5"
"fmt"
"io"
"os"
)
func main() {
tgz, _ := os.Open("tb.tar.gz")
gzMd5 := md5.New()
gz, _ := gzip.NewReader(io.TeeReader(tgz, gzMd5))
tarMd5 := md5.New()
tr := tar.NewReader(io.TeeReader(gz, tarMd5))
for {
fileMd5 := md5.New()
hdr, err := tr.Next()
if err == io.EOF {
break
}
io.Copy(fileMd5, tr)
fmt.Printf("%x %s\n", fileMd5.Sum(nil), hdr.Name)
}
fmt.Printf("%x tb.tar\n", tarMd5.Sum(nil))
fmt.Printf("%x tb.tar.gz\n", gzMd5.Sum(nil))
}
Now for the following example:
$ echo "a" > a.txt
$ echo "b" > b.txt
$ tar cf tb.tar a.txt b.txt
$ gzip -c tb.tar > tb.tar.gz
$ md5sum a.txt b.txt tb.tar tb.tar.gz
60b725f10c9c85c70d97880dfe8191b3 a.txt
3b5d5c3712955042212316173ccf37be b.txt
501352dcd8fbd0b8e3e887f7dafd9392 tb.tar
90d6ba204493d8e54d3b3b155bb7f370 tb.tar.gz
Linux Mint 14 ( Ubuntu 12.04) 1.02 Ubuntu go:
$ go run tarmd5.go
60b725f10c9c85c70d97880dfe8191b3 a.txt
3b5d5c3712955042212316173ccf37be b.txt
a26ddab1c324780ccb5199ef4dc38691 tb.tar
90d6ba204493d8e54d3b3b155bb7f370 tb.tar.gz
, , tb.tar, .
(, , .tar .tar.gz , - )
, , , 1 ( TeeReaders).