经常面对这类任务?
- 对着500多个Excel做同样的汇总统计
- 去除文本文件里的重复行
- 对比两个csv有啥差异
- 几十个Excel合并起来,把大Excel拆成多个小的
- 几个Excel的数据要用同一个列关联起来
| 1 | import pandas as pd |
| 2 | def iterate(col): |
| 3 | prev = 0; |
| 4 | res = 0; |
| 5 | val = 0; |
| 6 | for curr in col: |
| 7 | if curr – prev > 0: |
| 8 | res += 1; |
| 9 | else: |
| 10 | res = 0; |
| 11 | prev = curr; |
| 12 | if val < res: |
| 13 | val = res; |
| 14 | return val; |
| 15 | data = pd.read_excel('D:/Stock.xlsx',sheet_name=0). sort_values('Date').groupby('Company')['Price'].apply(iterate); |
| A | |
|---|---|
| 1 | =file("D:/Stock.xlsx").xlsimport@t().sort(Date).group(Company) |
| 2 | =0 |
| 3 | =A1.max(A2=if(Price>Price[-1],A2+1,0)) |
集合运算领域专业语法,同样过程代码更简洁!
| A | |
|---|---|
| 1 | $SELECT * FROM D:/Stock.xlsx WHERE Company='0001' ORDER BY date |
| 2 | $SELECT Company,max(price),min(price) from D:/Stock.xls WHERE month(date)=1 GROUP BY Company |
| 3 | $SELECT Company.Name,Stock.date,Stock.price FROM D:/Stock.xls Stock LEFT JOIN D:/Company.txt Company ON Stock.company=Company.ID |
| Old.csv | New.csv | |
| userName,date,saleValue,saleCount | userName,date,saleValue,saleCount | |
|---|---|---|
| 1 | Rachel,2015-03-01,4500,9 | Rachel,2015-03-01,4500,9 |
| 2 | Rachel,2015-03-03,8700,4 | Rachel,2015-03-02,5000,5 |
| 3 | Tom,2015-03-02,3000,8 | Ashley,2015-03-01,6000,5 |
| 4 | Tom,2015-03-03,5000,7 | Rachel,2015-03-03,11700,4 |
| 5 | Tom,2015-03-04,6000,12 | Tom,2015-03-03,5000,7 |
| 6 | John,2015-03-04,4800,4 |
| A | B | C | |
|---|---|---|---|
| 1 | =file("d:\\old.csv").import@ct() | =file("d:\\new.csv").import@ct() | /逗号分隔的文本 |
| 2 | =new=[B1,A1].merge@od() | /求差集 |
| Customer ID | Customer Name | Invoice Number | Amount | Purchase Date |
|---|---|---|---|---|
| 1234 | John Smith | 100-0002 | $1,200.00 | 2013/1/1 |
| 2345 | Mary Harrison | 100-0003 | $1,425.00 | 2013/1/6 |
| 3456 | Lucy Gomez | 100-0004 | $1,390.00 | 2013/1/11 |
| 4567 | Rupert Jones | 100-0005 | $1,257.00 | 2013/1/18 |
| 5678 | Jenny Walters | 100-0006 | $1,725.00 | 2013/1/24 |
| 6789 | Samantha Donaldson | 100-0007 | $1,995.00 | 2013/1/31 |
| A | B | C | |
|---|---|---|---|
| 1 | for directory@p("d:/excel/*.xlsx") | =file(A1).xlsopen() | /循环目录下每个Excel |
| 2 | =B1.conj(B1.xlsimport@t(;~.stname)) | /循环每个sheet | |
| 3 | =@|B2 | /依次合并sheet | |
| 4 | =A4.groups (month('Purchase Date'):Month;sum(Amount):Total,avg(Amount):Average) |
/分组汇总 | |