python统计文本字符串里单词出现频率的方法

yipeiwu_com6年前Python基础

本文实例讲述了python统计文本字符串里单词出现频率的方法。分享给大家供大家参考。具体实现方法如下:

# word frequency in a text
# tested with Python24  vegaseat  25aug2005
# Chinese wisdom ...
str1 = """Man who run in front of car, get tired.
Man who run behind car, get exhausted."""
print "Original string:"
print str1
print
# create a list of words separated at whitespaces
wordList1 = str1.split(None)
# strip any punctuation marks and build modified word list
# start with an empty list
wordList2 = []
for word1 in wordList1:
  # last character of each word
  lastchar = word1[-1:]
  # use a list of punctuation marks
  if lastchar in [",", ".", "!", "?", ";"]:
    word2 = word1.rstrip(lastchar)
  else:
    word2 = word1
  # build a wordList of lower case modified words
  wordList2.append(word2.lower())
print "Word list created from modified string:"
print wordList2
print
# create a wordfrequency dictionary
# start with an empty dictionary
freqD2 = {}
for word2 in wordList2:
  freqD2[word2] = freqD2.get(word2, 0) + 1
# create a list of keys and sort the list
# all words are lower case already
keyList = freqD2.keys()
keyList.sort()
print "Frequency of each word in the word list (sorted):"
for key2 in keyList:
 print "%-10s %d" % (key2, freqD2[key2])

希望本文所述对大家的Python程序设计有所帮助。

相关文章

Python告诉你木马程序的键盘记录原理

前言 Python keylogger键盘记录的功能的实现主要利用了pythoncom及pythonhook,然后就是对windows API的各种调用。Python之所以用起来方便快...

简单的Python调度器Schedule详解

最近在做项目的时候经常会用到定时任务,由于我的项目是使用Java来开发,用的是SpringBoot框架,因此要实现这个定时任务其实并不难。 后来我在想如果我要在Python中实现,我要怎...

详解python中的hashlib模块的使用

hashlib hashlib主要提供字符加密功能,将md5和sha模块整合到了一起,支持md5,sha1, sha224, sha256, sha384, sha512等算法 hash...

Python中pandas模块DataFrame创建方法示例

本文实例讲述了Python中pandas模块DataFrame创建方法。分享给大家供大家参考,具体如下: DataFrame创建 1. 通过列表创建DataFrame 2. 通过字典创...

Django 生成登陆验证码代码分享

Django 生成登陆验证码代码分享

环境准备 python3.52 pycharm5.05 Pillow 自制的验证码工具包/utils/check_code 验证码的作用 防恶意破解密码:防止,使用程序或...