python 高效去重复支持GB级别大文件的示例代码

yipeiwu_com6年前 (2020-03-06)Python基础

如下所示：

#coding=utf-8
 
import sys, re, os
 
def getDictList(dict):
  regx = '''[\w\~`\!\@\#\$\%\^\&\*\(\)\_\-\+\=\[\]\{\}\:\;\,\.\/\<\>\?]+'''
  with open(dict) as f:
    data = f.read()
    return re.findall(regx, data)
 
def rmdp(dictList):
  return list(set(dictList))
 
def fileSave(dictRmdp, out):
  with open(out, 'a') as f:
    for line in dictRmdp:
      f.write(line + '\n')
 
def main():
  try:
    dict = sys.argv[1].strip()
    out = sys.argv[2].strip()
  except Exception, e:
    print 'error:', e
    me = os.path.basename(__file__)
    print 'usage: %s <input> <output>' %me
    print 'example: %s dict.txt dict_rmdp.txt' %me
    exit()
 
  dictList = getDictList(dict)
  dictRmdp = rmdp(dictList)
  fileSave(dictRmdp, out)
   
if __name__ == '__main__':
  main()

以上这篇python 高效去重复支持GB级别大文件的示例代码就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持【听图阁-专注于Python设计】。

返回列表

上一篇：Python字符转换

下一篇：PHP生成静态页面详解

运行django项目指定IP和端口的方法

一、django项目启动命令默认IP和端口 python manage.py runserver 指定端口 python manage.py runserver 192.1...

Python自动化完成tb喵币任务的操作方法

2019双十一，tb推出了新的活动，商店喵币，看了一下每天都有几个任务来领取喵币，从而升级店铺赚钱，然而我既想赚红包又不想干苦力，遂使用python来进行手机自动化操作，目测全网首发！...

web.py 十分钟创建简易博客实现代码

一、web.py简介 web.py是一款轻量级的Python web开发框架，简单、高效、学习成本低，特别适合作为python web开发的入门框架。官方站点：http://webpy....

基于python if 判断选择结构的实例详解

代码执行结构为顺序结构、选择结构、循环结构。 python判断选择结构【if】 if 判断条件 #进行判断条件满足之后执行下方语句执行语句 elif 判断条件 #在不满足上面所有...

Python利用递归实现文件的复制方法

如下所示： import os import time from collections import deque """ 利用递归实现目录的遍历 @para sourcePath...

宜配屋

python 高效去重复支持GB级别大文件的示例代码

相关文章

运行django项目指定IP和端口的方法

Python自动化完成tb喵币任务的操作方法

web.py 十分钟创建简易博客实现代码

基于python if 判断选择结构的实例详解

Python利用递归实现文件的复制方法

© YiPeiWu.com 【宜配屋】粤ICP备17031333号

Powered By Z-BlogPHP. Theme by TOYEAN.

宜配屋

python 高效去重复 支持GB级别大文件的示例代码

相关文章

运行django项目指定IP和端口的方法

Python自动化完成tb喵币任务的操作方法

web.py 十分钟创建简易博客实现代码

基于python if 判断选择结构的实例详解

Python利用递归实现文件的复制方法

© YiPeiWu.com 【宜配屋】 粤ICP备17031333号 var _hmt = _hmt || [];(function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?8aa60ae04b767b2af31903508928acc0"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s);})();

Powered By Z-BlogPHP. Theme by TOYEAN.

python 高效去重复支持GB级别大文件的示例代码

© YiPeiWu.com 【宜配屋】粤ICP备17031333号