Java如何识别并读取不同编码的文本文件

    技术2022-05-14  6

    相信大部分人都知道,txt文件有四种编码格式,"GBK", "UTF-8", "Unicode", "UTF-16BE",每一种编码格式的区分在于写入文件头的信息不同.为了避免读取乱码的现象,我们应该在读取文本之前先读取文件头信息,以便做出正确的读取编码方式.下面给出方法.

    /** * 判断文件的编码格式 * @param fileName :file * @return 文件编码格式 * @throws Exception */ public static String codeString(String fileName) throws Exception{ BufferedInputStream bin = new BufferedInputStream( new FileInputStream(fileName)); int p = (bin.read() << 8) + bin.read(); String code = null; switch (p) { case 0xefbb: code = "UTF-8"; break; case 0xfffe: code = "Unicode"; break; case 0xfeff: code = "UTF-16BE"; break; default: code = "GBK"; } return code; } 

    然后,以字符流的方式读取文本

    FileInputStream fInputStream = new FileInputStream(file); //code为上面方法里返回的编码方式 InputStreamReader inputStreamReader = new InputStreamReader(fInputStream, code); BufferedReader in = new BufferedReader(inputStreamReader); String strTmp = ""; //按行读取 while (( strTmp = in.readLine()) != null) { sBuffer.append(strTmp + "/n"); } return sBuffer.toString(); 


    最新回复(0)