说到哈夫曼编码,我们先了解一下什么是哈夫曼树:
哈夫曼树(Huffman Tree)是在叶子结点和权重确定的情况下,带权路径长度最小的二叉树,也被称为最优二叉树。
故依据哈夫曼树的带权路径长度最小的这个特点,我们可以实现对文件进行压缩。
一般情况下我们考虑对文本文件进行压缩,而很少考虑对图片、音频等进行压缩,对于此的原因这里我总结了几点如下:
- 文本文件中多是文字信息,而对于文字来说在目前一般的编码方式里(如utf-8)是以两个字节即16个bit二进制来进行存储的,故在编码时若文字字符对应的编码串长度小于16就能实现压缩
- 每个人写作都有一定的习惯,比如说我们总是在不经意间习惯性的对某些文字的使用频率大大高于其他文字的使用频率,同时经常使用的,这样就更有利于我们建立更短权值路径的哈夫曼树
- 对于图片、音频等文件,都是使用一个字节一个字节的进行存储,故要做到压缩,需要在编码时其对应的编码串长度小于8才能实现压缩的目的
- 图片、音频等文件,其文件中各字节出现的频次都非常接近,而这一点是非常不利于我们期望达成的压缩目的的
下面我们以一段代码来统计一下一个视频文件中各个字节出现的频次
public class Demo {
public static void main(String[] args) throws IOException {
FileInputStream fis=new FileInputStream("C:\\Users\\zyx\\Desktop\\4.mp4");
FileOutputStream fos=new FileOutputStream("C:\\Users\\zyx\\Desktop\\1.mp4");
int len;
byte[] buf=new byte[1024];
HashMap<Byte,Integer> map=new HashMap<>();
while((len=fis.read(buf))!=-1) {
for(int i=0;i<len;i++) {
if(map.get(buf[i])==null) {
map.put(buf[i], 1);
}else {
map.put( buf[i], map.get(buf[i])+1);
}
}
}
System.out.println(map);
}
}
运行结果如下:
可以明显的看到除了0以外的各个字节出现的频次都非常接近,故猜测我们在对这类文件进行编码压缩时,最终结果在很大概率上压缩效率会很低,甚至还可能发生膨胀的情况,这与我们期望的压缩的目的就背道而驰了。
了解了选择.txt文本文件的原因后,我们再来了解一下哈夫曼树的建树过程:
- 初始化各节点的权值信息
- 对节点集合进行排序,每次移除权值最小的两个节点,通过这两个节点生成一个父节点,父节点的权值等于两个子节点的权值之和,并分别指向这两个子节点,最后将父节点加入到节点集合中
- 重复2过程,直到节点数组中只剩一个节点,此节点就作为哈夫曼树的根节点
补充:可以发现在实际的压缩过程中,建树环节是最消耗时间的一块,因此对于上面的2过程,推荐使用快速排序的方式,这样相比于普通的冒泡排序的方式,效率的提升就非常明显了
看一下冒泡和快排部分的代码:
public void sort() {
Node dem;
for(int i=0;i<nodes.size()-1;i++) {
for(int j=0;j<nodes.size()-1-i;j++) {
if(nodes.get(j).weight>nodes.get(j+1).weight) {
dem=nodes.get(j);
nodes.set(j, nodes.get(j+1));
nodes.set(j+1, dem);
}
}
}
}
public void quickSort(int begin,int end) {
if(begin<end) {
Node temp=nodes.get(begin);
int i=begin;
int j=end;
while(i<j) {
while(i<j&&nodes.get(j).weight>temp.weight)
j--;
nodes.set(i, nodes.get(j));
while(i<j&&nodes.get(i).weight<=temp.weight)
i++;
nodes.set(j, nodes.get(i));
}
nodes.set(i, temp);
quickSort(begin, i-1);
quickSort(i+1, end);
}
}
上面的是普通冒泡排序的代码,下面的是快速排序的代码
下面通过程序运行来测试两种排序在压缩一个413KB的文本文件的耗时,运行结果如下:
冒泡的运行结果:
快排的运行结果:
由运行结果可见,由冒泡排序改为快排后效率的提升约8倍有余
再看完前言后,下面进入正式部分
一、实现思路
1、压缩:
- 通过IO流读取统计文本文件中每个字符出现的次数,将出现的次数作为权重初始化LinkedList节点集合
- 建树过程采用快排
- 使用StringBuilder记录总编码串,通过余运算得出需要补"0"的个数给编码串补上相应的"0"的个数使其能被8整除,将StringBuilder转化为一系列长度为8的String字符串,并通过自定义方法将其转换为相应的字节
- 二叉树遍历以及编码方案
- 读入一个需要压缩的文本文件,输出一个配置文件和一个压缩文件,配置文件用于存储保存了键值对的HashMap对象,压缩文件则存储压缩后的编码信息以及补0的个数的标记信息
2、解压
- 自定义负数字节转换为相应的int值的方法
- 通过对象流读取配置文件拿到对应的HashMap对象,再通过文件流经自定义方法配合Integer类中的静态方法toBinaryString(int i)经一系列处理还原得到之前的编码串
- 将得到的编码串经HashMap中的键值对信息一一比对并经IO流写出即可一个字节不差的还原为原来的文本文件
3、编码器界面
- 主要使用到java类库中的JFileChooser类分别完成对文本文件路径的选取,以及输出目录路径的选择
- 完成一定的布局,添加相应的监听器功能
- 对输入输出路径是否正确的检测以及调用压缩和解压两个类中的关键方法,成功压缩或解压弹出提示框
二、源码部分
1、Hfm类(压缩):
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.ObjectOutputStream;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class Hfm {
public static void main(String[] args) throws Exception {
Hfm h=new Hfm();
long start=System.currentTimeMillis();
h.start("C:\\Users\\zyx\\Desktop\\2.txt","C:\\Users\\zyx\\Desktop\\config.txt","C:\\Users\\zyx\\Desktop\\12b.txt");
long end=System.currentTimeMillis();
System.out.println("总体耗时"+(end-start)+"毫秒");
}
Node root;
List<Node> nodes=new LinkedList<Node>();
StringBuilder strb=new StringBuilder();
HashMap<Character, String> hashMap=new HashMap<>();
HashMap<String, Character> deMap=new HashMap<>();
StringBuilder strb2=new StringBuilder();
public void readFile(String path) throws Exception {
BufferedReader br=new BufferedReader(new InputStreamReader(new FileInputStream(path),"utf-8"));
char[] cbuf=new char[1024];
int len;
while((len=br.read(cbuf))!=-1) {
strb.append(new String(cbuf,0,len));
}
}
public void start(String path,String config,String outPath) throws Exception {
readFile(path);
HashMap<Character, Integer> map=new HashMap<>();
char[] arr=strb.toString().toCharArray();
for(char c:arr) {
if(map.get(c)==null) {
map.put(c, 1);
}else {
map.put(c, map.get(c)+1);
}
}
Set<Map.Entry<Character, Integer>> entrySet=map.entrySet();
for(Map.Entry<Character, Integer> entry: entrySet) {
Node node=new Node(entry.getValue(), entry.getKey());
nodes.add(node);
}
long start=System.currentTimeMillis();
while(nodes.size()>1) {
// sort();
quickSort(0, nodes.size()-1);
Node left=nodes.remove(0);
Node right=nodes.remove(0);
Node node=new Node(left.weight+right.weight);
node.left=left;
node.right=right;
nodes.add(node);
}
long end=System.currentTimeMillis();
System.out.println("建树过程耗时"+(end-start)+"毫秒");
root=nodes.get(0);
order(root);
for(char c:arr) {
strb2.append(hashMap.get(c));
}
int last=strb2.length()%8;
for(int i=0;i<8-last;i++) {
strb2.append(0);
}
int flag=strb2.length()/8;
String s=strb2.toString();
FileOutputStream fos=new FileOutputStream(outPath);
BufferedOutputStream bos=new BufferedOutputStream(fos);
for(int i=0;i<flag;i++) {
String str=s.substring(0+i*8, 8+i*8);
bos.write(toByte(str));
}
bos.write(8-last);
bos.close();
ObjectOutputStream oos=new ObjectOutputStream(new FileOutputStream(config));
oos.writeObject(deMap);
}
public static byte toByte(String str) {
char[] c=str.toCharArray();
int num=0;
for(int i=0;i<8;i++) {
switch(i) {
case 0:num+=(c[i]-48)*128;break;
case 1:num+=(c[i]-48)*64;break;
case 2:num+=(c[i]-48)*32;break;
case 3:num+=(c[i]-48)*16;break;
case 4:num+=(c[i]-48)*8;break;
case 5:num+=(c[i]-48)*4;break;
case 6:num+=(c[i]-48)*2;break;
case 7:num+=(c[i]-48)*1;break;
}
}
return (byte)num;
}
public void order(Node node) {
if(node==null)
return;
setCode(node);
order(node.left);
order(node.right);
}
public void setCode(Node node) {
if(node.left!=null) {
node.left.code=node.code+0;
}
if(node.right!=null) {
node.right.code=node.code+1;
}
if(node.left==null&&node.right==null) {
hashMap.put(node.name, node.code);
deMap.put(node.code, node.name);
}
}
public void quickSort(int begin,int end) {
if(begin<end) {
Node temp=nodes.get(begin);
int i=begin;
int j=end;
while(i<j) {
while(i<j&&nodes.get(j).weight>temp.weight)
j--;
nodes.set(i, nodes.get(j));
while(i<j&&nodes.get(i).weight<=temp.weight)
i++;
nodes.set(j, nodes.get(i));
}
nodes.set(i, temp);
quickSort(begin, i-1);
quickSort(i+1, end);
}
}
public void sort() {
Node dem;
for(int i=0;i<nodes.size()-1;i++) {
for(int j=0;j<nodes.size()-1-i;j++) {
if(nodes.get(j).weight>nodes.get(j+1).weight) {
dem=nodes.get(j);
nodes.set(j, nodes.get(j+1));
nodes.set(j+1, dem);
}
}
}
}
}
class Node {
int weight;
Node left;
Node right;
String code="";
char name;
Node(int weight){
this.weight=weight;
}
Node(int weight,char name){
this.weight=weight;
this.name=name;
}
}
2、Decompression类(解压):
import java.io.BufferedInputStream;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.OutputStreamWriter;
import java.util.HashMap;
public class Decompression {
public static void main(String[] args) {
Decompression d=new Decompression();
try {
d.readConfig("C:\\Users\\zyx\\Desktop\\12b.txt", "C:\\Users\\zyx\\Desktop\\config.txt","C:\\Users\\zyx\\Desktop\\12c.txt");
} catch (Exception e) {
e.printStackTrace();
}
}
HashMap<String, Character> map;
StringBuilder strb=new StringBuilder();
public int toInt(byte b) {
int num=0;
if(b>=0&&b<128) {
num=b;
} else if(b<0) {
num=(255-(-b))+1;
}
return num;
}
@SuppressWarnings("unchecked")
public void readConfig(String path,String config,String outPath) throws Exception {
ObjectInputStream ois = null;
BufferedInputStream bis = null;
BufferedWriter bw = null;
try {
ois = new ObjectInputStream(new FileInputStream(config));
map = (HashMap<String, Character>) ois.readObject();
System.out.println(map);
FileInputStream fis = new FileInputStream(path);
bis = new BufferedInputStream(fis);
int len;
byte[] buf = new byte[1024];
int num;
int flag;
while ((len = bis.read(buf)) != -1) {
for (int i = 0; i < len; i++) {
num = toInt(buf[i]);
flag = Integer.toBinaryString(num).length();
for (int j = 0; j < 8 - flag; j++) {
strb.append("0");
}
strb.append(Integer.toBinaryString(num));
}
}
int tab=Hfm.toByte(strb.substring(strb.length()-8, strb.length()));
strb.delete(strb.length()-8-tab, strb.length());
bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outPath),"utf-8"));
StringBuilder strb1 = new StringBuilder();
int i = 0;
while (true) {
if (strb.length() == 0)
break;
strb1.append(strb.charAt(i++));
Character c = map.get(strb1.toString());
if (c != null) {
bw.write(c);
strb.delete(0, strb1.length());
strb1.delete(0, strb1.length());
i = 0;
}
}
} catch (Exception e) {
throw new Exception();
} finally {
if(bw!=null) {
try {
bw.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if(bis!=null) {
try {
bis.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if(ois!=null) {
try {
ois.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
}
3、EncodeUI(编码器界面):
import javax.swing.*;
import javax.swing.filechooser.FileNameExtensionFilter;
import java.awt.*;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import java.io.File;
import java.io.IOException;
public class EncodeUI extends JFrame{
public static void main(String[] args) {
EncodeUI ui=new EncodeUI();
ui.init();
}
public void init() {
this.setTitle("哈夫曼编码器");
this.setDefaultCloseOperation(3);
this.setBounds(800, 300, 300, 400);
JPanel jp1=new JPanel();
JPanel jp2=new JPanel();
JPanel jp3=new JPanel();
this.add(jp1,BorderLayout.NORTH);
this.add(jp2,BorderLayout.CENTER);
this.add(jp3,BorderLayout.SOUTH);
jp1.setPreferredSize(new Dimension(0,80));
jp3.setPreferredSize(new Dimension(0,180));
JButton jb1=new JButton();
JButton jb2=new JButton();
jb1.setText("编码压缩");
jb2.setText("解码还原");
Dimension dm=new Dimension(200,50);
jb1.setPreferredSize(dm);
jb2.setPreferredSize(dm);
jp2.add(jb1);
jp3.add(jb2);
this.setVisible(true);
jb1.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
compressUI();
}
});
jb2.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
decompressionUI();
}
});
}
void compressUI() {
JFrame jf=new JFrame();
jf.setTitle("compress");
jf.setDefaultCloseOperation(3);
jf.setBounds(500, 200, 650, 400);
JPanel jp1=new JPanel();
jp1.setPreferredSize(new Dimension(0,20));
jf.add(jp1,BorderLayout.NORTH);
JPanel jp2=new JPanel();
jf.add(jp2,BorderLayout.CENTER);
JButton jb2=new JButton();
jb2.setPreferredSize(new Dimension(170,40));
jb2.setText("选择需要压缩的文件");
jp2.add(jb2);
JTextField jtf=new JTextField();
jtf.setPreferredSize(new Dimension(400,40));
jtf.setEditable(false);
jp2.add(jtf);
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("TXT", "txt");
chooser.setFileFilter(filter);
jb2.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
int returnVal = chooser.showOpenDialog(jf);
if(returnVal == JFileChooser.APPROVE_OPTION) {
System.out.println("You chose to open this file: " +
chooser.getSelectedFile());
jtf.setText(""+chooser.getSelectedFile());
}
}
});
JButton jb3=new JButton();
jb3.setPreferredSize(new Dimension(170,40));
jb3.setText("Config Path");
jp2.add(jb3);
JTextField jtf1=new JTextField();
jtf1.setPreferredSize(new Dimension(250,40));
jtf1.setEditable(false);
jp2.add(jtf1);
JTextField jtf1a=new JTextField();
jtf1a.setPreferredSize(new Dimension(150,40));
jp2.add(jtf1a);
JFileChooser chooser1 = new JFileChooser();
chooser1.setFileSelectionMode(JFileChooser.DIRECTORIES_ONLY);
jb3.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
int result = chooser1.showOpenDialog(null);
String fname = chooser1.getName(chooser1.getSelectedFile());
if (result == JFileChooser.APPROVE_OPTION) {
String filePath = chooser1.getSelectedFile().getPath();
jtf1.setText(filePath+"\\");
}
}
});
JButton jb4=new JButton();
jb4.setPreferredSize(new Dimension(170,40));
jb4.setText("Output path");
jp2.add(jb4);
JTextField jtf2=new JTextField();
jtf2.setPreferredSize(new Dimension(250,40));
jtf2.setEditable(false);
jp2.add(jtf2);
JTextField jtf2a=new JTextField();
jtf2a.setPreferredSize(new Dimension(150,40));
jp2.add(jtf2a);
jb4.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
int result = chooser1.showOpenDialog(null);
String fname = chooser1.getName(chooser1.getSelectedFile());
if (result == JFileChooser.APPROVE_OPTION) {
String filePath = chooser1.getSelectedFile().getPath();
jtf2.setText(filePath+"\\");
}
}
});
JPanel jp=new JPanel();
jp.setPreferredSize(new Dimension(0,100));
jf.add(jp,BorderLayout.SOUTH);
JButton jb=new JButton();
jb.setText("压缩");
jb.setPreferredSize(new Dimension(100,30));
jp.add(jb);
jb.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
String filePath=jtf.getText();
String configPath=jtf1.getText()+jtf1a.getText();
String outputPath=jtf2.getText()+jtf2a.getText();
File f1=new File(filePath);
File f2=new File(configPath);
File f3=new File(outputPath);
try {
f2.createNewFile();
f3.createNewFile();
} catch (IOException e1) {
// TODO Auto-generated catch block
JOptionPane.showMessageDialog(null, "Your input is "+"error", "", JOptionPane.PLAIN_MESSAGE);
return;
}
if(!f1.isFile()||!f2.isFile()||!f3.isFile()) {
JOptionPane.showMessageDialog(null, "Your input is "+"error", "", JOptionPane.PLAIN_MESSAGE);
return;
}
long start=System.currentTimeMillis();
try {
new Hfm().start(filePath, configPath, outputPath);
} catch (Exception e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
return;
}
long end=System.currentTimeMillis();
int time=(int) (start-end);
JOptionPane.showMessageDialog(null, "压缩成功,共耗时"+time+"毫秒","", JOptionPane.PLAIN_MESSAGE);
}
});
JButton jb1=new JButton();
jb1.setText("返回");
jb1.setPreferredSize(new Dimension(100,30));
jp.add(jb1);
jf.setVisible(true);
jb1.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
jf.dispose();
}
});
}
void decompressionUI() {
JFrame jf=new JFrame();
jf.setTitle("decompression");
jf.setDefaultCloseOperation(3);
jf.setBounds(500, 200, 650, 400);
JPanel jp1=new JPanel();
jp1.setPreferredSize(new Dimension(0,20));
jf.add(jp1,BorderLayout.NORTH);
JPanel jp2=new JPanel();
jf.add(jp2,BorderLayout.CENTER);
JButton jb2=new JButton();
jb2.setPreferredSize(new Dimension(170,40));
jb2.setText("选择需要解压的文件");
jp2.add(jb2);
JTextField jtf=new JTextField();
jtf.setPreferredSize(new Dimension(400,40));
jtf.setEditable(false);
jp2.add(jtf);
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("TXT", "txt");
chooser.setFileFilter(filter);
jb2.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
int returnVal = chooser.showOpenDialog(jf);
if(returnVal == JFileChooser.APPROVE_OPTION) {
System.out.println("You chose to open this file: " +
chooser.getSelectedFile());
jtf.setText(""+chooser.getSelectedFile());
}
}
});
JButton jb3=new JButton();
jb3.setPreferredSize(new Dimension(170,40));
jb3.setText("Config Path");
jp2.add(jb3);
JTextField jtf1=new JTextField();
jtf1.setPreferredSize(new Dimension(400,40));
jtf1.setEditable(false);
jp2.add(jtf1);
jb3.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
int returnVal = chooser.showOpenDialog(jf);
if(returnVal == JFileChooser.APPROVE_OPTION) {
System.out.println("You chose to open this file: " +
chooser.getSelectedFile());
jtf1.setText(""+chooser.getSelectedFile());
}
}
});
JButton jb4=new JButton();
jb4.setPreferredSize(new Dimension(170,40));
jb4.setText("Output path");
jp2.add(jb4);
JTextField jtf2=new JTextField();
jtf2.setPreferredSize(new Dimension(250,40));
jtf2.setEditable(false);
jp2.add(jtf2);
JTextField jtf2a=new JTextField();
jtf2a.setPreferredSize(new Dimension(150,40));
jp2.add(jtf2a);
JFileChooser chooser1 = new JFileChooser();
chooser1.setFileSelectionMode(JFileChooser.DIRECTORIES_ONLY);
jb4.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
int result = chooser1.showOpenDialog(null);
String fname = chooser1.getName(chooser1.getSelectedFile());
if (result == JFileChooser.APPROVE_OPTION) {
String filePath = chooser1.getSelectedFile().getPath();
jtf2.setText(filePath+"\\");
}
}
});
JPanel jp=new JPanel();
jp.setPreferredSize(new Dimension(0,100));
jf.add(jp,BorderLayout.SOUTH);
JButton jb=new JButton();
jb.setText("解压");
jb.setPreferredSize(new Dimension(100,30));
jp.add(jb);
jb.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
String filePath=jtf.getText();
String configPath=jtf1.getText();
String outputPath=jtf2.getText()+jtf2a.getText();
File f1=new File(filePath);
File f2=new File(configPath);
File f3=new File(outputPath);
try {
f2.createNewFile();
f3.createNewFile();
} catch (IOException e1) {
// TODO Auto-generated catch block
JOptionPane.showMessageDialog(null, "Your input is "+"error", "", JOptionPane.PLAIN_MESSAGE);
return;
}
if(!f1.isFile()||!f2.isFile()||!f3.isFile()) {
JOptionPane.showMessageDialog(null, "Your input is "+"error", "", JOptionPane.PLAIN_MESSAGE);
return;
}
long start=System.currentTimeMillis();
try {
new Decompression().readConfig(filePath, configPath, outputPath);
} catch (Exception e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
return;
}
long end=System.currentTimeMillis();
int time=(int) (start-end);
JOptionPane.showMessageDialog(null, "解压成功,共耗时"+time+"毫秒","", JOptionPane.PLAIN_MESSAGE);
}
});
JButton jb1=new JButton();
jb1.setText("返回");
jb1.setPreferredSize(new Dimension(100,30));
jp.add(jb1);
jf.setVisible(true);
jb1.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
jf.dispose();
}
});
}
}
三、效果展示
1、对此文件进行压缩
2、界面效果
3、选取文件路径
4、输出路径
5、压缩操作
6、压缩效果查看
7、 解压操作
8、还原效果查看