1. AKB48総選挙データのスクレイピング
-1-1. Webからデータを読み込む
-1-2. 必要な情報だけ(票数)を選別する
-1-3. 必要な情報だけ(名前)を選別する
-1-4. データフレーム化する

2. 2017年総選挙データ(東京)のスクレイピング
-2-1. Webからデータを読み込む
-2-2. 必要な情報だけ(名前)を選別する
-2-3. 必要な情報だけ(年齢)を選別する
-2-4. 必要な情報だけ(票数)を選別する
-2-5. 必要な情報だけ(政党)を選別する
-2-6. 必要な情報だけ(当選回数)を選別する
-2-7. 必要な情報だけ(status)を選別する
-2-8. データフレーム化する

1. AKB48総選挙データのスクレイピング

1-1. Webからデータを読み込む

・Rを使うと、Webサイトから文字情報や数字情報を読み込んで、データフレームを作ることができる。
・ここでは AKB48総選挙のホームページ(http://www.akb48.co.jp/sousenkyo_45th/result.php) から、Rを使って、データを読み取ってみる。

・AKB48の総選挙のホームページ上には次のようなデータが掲示されている。

・例えば、第一位の指原莉乃さんのデータは次のように表示されている。

・上記表示に該当するソースを確認する。
・「表示」→「開発/管理」→「ソースを表示」を選ぶ。

・Rを使ってWebサイトに接続し、ソース情報を一括で取得し、result と名前をつける。

result = readLines("http://www.akb48.co.jp/sousenkyo_45th/result.php", encoding = "UTF-8")

・読み取ったデータ result の最初の8行を表示してみる。

head(result)
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"                                                                                   
[2] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"
[3] "<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ja\" xml:lang=\"ja\" xmlns:og=\"http://ogp.me/ns#\">"                     
[4] "<head>"                                                                                                                       
[5] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />"                                                    
[6] "<meta http-equiv=\"Content-Style-Type\" content=\"text/css\" />"                                                              

1-2. 必要な情報(票数)だけを選別する

・AKB48総選挙のホームページの「ソースを表示」を見る。
・集めたい情報(票数)は result_count の後にある。

・最初にgrep()関数を使って、ソースの一括情報 result から得票数のデータだけを選別
→ result_count と名前をつける。

result_count = grep("result_count", result)

・うまく取り出せたか、先頭6行を確認。

head(result[result_count])
[1] "\t\t                    <p class=\"result_count\">243,011票</p>"
[2] "\t\t                    <p class=\"result_count\">175,613票</p>"
[3] "\t\t                    <p class=\"result_count\">112,341票</p>"
[4] "\t\t                    <p class=\"result_count\">110,411票</p>"
[5] "\t\t                    <p class=\"result_count\">92,110票</p>" 
[6] "\t\t                    <p class=\"result_count\">78,279票</p>" 

gsub()関数を使って、票数以外の文字を消す。

gsub(".*<p class=\"result_count\">(.*)票</p>","\\1", result[result_count])
 [1] "243,011" "175,613" "112,341" "110,411" "92,110"  "78,279"  "69,159" 
 [8] "68,126"  "60,591"  "58,624"  "58,610"  "50,190"  "47,094"  "43,318" 
[15] "40,648"  "40,071"  "40,011"  "36,894"  "33,524"  "33,176"  "32,886" 
[22] "32,118"  "31,314"  "29,983"  "29,517"  "29,333"  "29,213"  "28,706" 
[29] "28,553"  "28,369"  "28,282"  "28,260"  "27,487"  "26,152"  "25,963" 
[36] "25,613"  "25,039"  "24,059"  "23,251"  "22,995"  "22,429"  "21,881" 
[43] "21,864"  "21,559"  "21,009"  "20,980"  "20,913"  "20,643"  "20,618" 
[50] "20,228"  "20,021"  "19,534"  "19,377"  "19,326"  "19,274"  "19,140" 
[57] "18,524"  "17,898"  "16,839"  "16,691"  "16,548"  "16,246"  "15,994" 
[64] "15,793"  "15,716"  "15,697"  "15,600"  "15,057"  "14,950"  "14,913" 
[71] "14,550"  "14,544"  "14,177"  "13,882"  "13,657"  "13,571"  "13,512" 
[78] "13,366"  "13,204"  "13,058" 

・カンマが入っていると文字列になってしまうので
 →gsub()関数を使って、カンマを消す。
 →as.integer()関数を使って、数字を関数化する。
→関数化したものに akb_count と名前をつける。

akb_count = as.integer(gsub(",", "",
                            gsub(".*<p class=\"result_count\">(.*)票</p>","\\1", result[result_count])))

・うまくできたか確認。

akb_count
 [1] 243011 175613 112341 110411  92110  78279  69159  68126  60591  58624
[11]  58610  50190  47094  43318  40648  40071  40011  36894  33524  33176
[21]  32886  32118  31314  29983  29517  29333  29213  28706  28553  28369
[31]  28282  28260  27487  26152  25963  25613  25039  24059  23251  22995
[41]  22429  21881  21864  21559  21009  20980  20913  20643  20618  20228
[51]  20021  19534  19377  19326  19274  19140  18524  17898  16839  16691
[61]  16548  16246  15994  15793  15716  15697  15600  15057  14950  14913
[71]  14550  14544  14177  13882  13657  13571  13512  13366  13204  13058

1-3. 必要な情報(名前)だけを選別する。

・同様に、メンバーの名前を取得する。

result_name = grep("result_name", result)

・うまく取り出せたか、先頭6行を確認。

head(result[result_name])
[1] "\t\t                    <h4 class=\"result_name\">指原 莉乃</h4>"  
[2] "\t\t                    <h4 class=\"result_name\">渡辺 麻友</h4>"  
[3] "\t\t                    <h4 class=\"result_name\">松井 珠理奈</h4>"
[4] "\t\t                    <h4 class=\"result_name\">山本 彩</h4>"    
[5] "\t\t                    <h4 class=\"result_name\">柏木 由紀</h4>"  
[6] "\t\t                    <h4 class=\"result_name\">宮脇 咲良</h4>"  

gsub()関数を使って、名前以外の文字を消す。

gsub(".*<h4 class=\"result_name\">(.*)</h4>","\\1",result[result_name])
 [1] "指原 莉乃"                    "渡辺 麻友"                   
 [3] "松井 珠理奈"                  "山本 彩"                     
 [5] "柏木 由紀"                    "宮脇 咲良"                   
 [7] "須田 亜香里"                  "島崎 遥香"                   
 [9] "兒玉 遥"                      "武藤 十夢"                   
[11] "横山 由依"                    "北原 里英"                   
[13] "向井地 美音"                  "岡田 奈々"                   
[15] "高橋 朱里"                    "にゃんにゃん仮面(小嶋陽菜)"
[17] "峯岸 みなみ"                  "入山 杏奈"                   
[19] "小嶋 真子"                    "高柳 明音"                   
[21] "込山 榛香"                    "大場 美奈"                   
[23] "朝長 美桜"                    "白間 美瑠"                   
[25] "沖田 彩華"                    "加藤 玲奈"                   
[27] "川本 紗矢"                    "矢吹 奈子"                   
[29] "古畑 奈和"                    "惣田 紗莉渚"                 
[31] "竹内 彩姫"                    "大島 涼花"                   
[33] "矢倉 楓子"                    "倉野尾 成美"                 
[35] "江籠 裕奈"                    "本村 碧唯"                   
[37] "木崎 ゆりあ"                  "佐々木 優佳里"               
[39] "薮下 柊"                      "渕上 舞"                     
[41] "藤江 れいな"                  "冨吉 明日香"                 
[43] "田島 芽瑠"                    "須藤 凜々花"                 
[45] "田中 美久"                    "松岡 菜摘"                   
[47] "茂木 忍"                      "井上 由莉耶"                 
[49] "二村 春香"                    "森保 まどか"                 
[51] "岩立 沙穂"                    "太田 夢莉"                   
[53] "神志那 結衣"                  "竹内 舞"                     
[55] "谷 真理佳"                    "渋谷 凪咲"                   
[57] "岡田 彩花"                    "植木 南央"                   
[59] "坂口 理子"                    "駒田 京伽"                   
[61] "西野 未姫"                    "大和田 南那"                 
[63] "酒井 萌衣"                    "北川 綾巴"                   
[65] "宮前 杏実"                    "岸野 里香"                   
[67] "熊崎 晴香"                    "木本 花音"                   
[69] "谷口 めぐ"                    "坂口 渚沙"                   
[71] "山内 鈴蘭"                    "秋吉 優花"                   
[73] "大森 美優"                    "鎌田 菜月"                   
[75] "佐藤 すみれ"                  "加藤 美南"                   
[77] "吉田 朱里"                    "宮崎 美穂"                   
[79] "日高 優月"                    "村重 杏奈"                   

・先ほど、gsub()で置換して名前だけになったデータに akb_names と名前をつける。

akb_names = gsub(".*<h4 class=\"result_name\">(.*)</h4>","\\1",result[result_name])

・うまくできたか確認。

akb_names
 [1] "指原 莉乃"                    "渡辺 麻友"                   
 [3] "松井 珠理奈"                  "山本 彩"                     
 [5] "柏木 由紀"                    "宮脇 咲良"                   
 [7] "須田 亜香里"                  "島崎 遥香"                   
 [9] "兒玉 遥"                      "武藤 十夢"                   
[11] "横山 由依"                    "北原 里英"                   
[13] "向井地 美音"                  "岡田 奈々"                   
[15] "高橋 朱里"                    "にゃんにゃん仮面(小嶋陽菜)"
[17] "峯岸 みなみ"                  "入山 杏奈"                   
[19] "小嶋 真子"                    "高柳 明音"                   
[21] "込山 榛香"                    "大場 美奈"                   
[23] "朝長 美桜"                    "白間 美瑠"                   
[25] "沖田 彩華"                    "加藤 玲奈"                   
[27] "川本 紗矢"                    "矢吹 奈子"                   
[29] "古畑 奈和"                    "惣田 紗莉渚"                 
[31] "竹内 彩姫"                    "大島 涼花"                   
[33] "矢倉 楓子"                    "倉野尾 成美"                 
[35] "江籠 裕奈"                    "本村 碧唯"                   
[37] "木崎 ゆりあ"                  "佐々木 優佳里"               
[39] "薮下 柊"                      "渕上 舞"                     
[41] "藤江 れいな"                  "冨吉 明日香"                 
[43] "田島 芽瑠"                    "須藤 凜々花"                 
[45] "田中 美久"                    "松岡 菜摘"                   
[47] "茂木 忍"                      "井上 由莉耶"                 
[49] "二村 春香"                    "森保 まどか"                 
[51] "岩立 沙穂"                    "太田 夢莉"                   
[53] "神志那 結衣"                  "竹内 舞"                     
[55] "谷 真理佳"                    "渋谷 凪咲"                   
[57] "岡田 彩花"                    "植木 南央"                   
[59] "坂口 理子"                    "駒田 京伽"                   
[61] "西野 未姫"                    "大和田 南那"                 
[63] "酒井 萌衣"                    "北川 綾巴"                   
[65] "宮前 杏実"                    "岸野 里香"                   
[67] "熊崎 晴香"                    "木本 花音"                   
[69] "谷口 めぐ"                    "坂口 渚沙"                   
[71] "山内 鈴蘭"                    "秋吉 優花"                   
[73] "大森 美優"                    "鎌田 菜月"                   
[75] "佐藤 すみれ"                  "加藤 美南"                   
[77] "吉田 朱里"                    "宮崎 美穂"                   
[79] "日高 優月"                    "村重 杏奈"                   

1-4. データフレーム化する

data.frame()関数を使って akb_names と akb_count をデータフレームに取り込み、df.akb と名前をつける。

df.akb = data.frame(akb_names, akb_count)

・うまくデータフレーム化できたか確認。

df.akb
                      akb_names akb_count
1                     指原 莉乃    243011
2                     渡辺 麻友    175613
3                   松井 珠理奈    112341
4                       山本 彩    110411
5                     柏木 由紀     92110
6                     宮脇 咲良     78279
7                   須田 亜香里     69159
8                     島崎 遥香     68126
9                       兒玉 遥     60591
10                    武藤 十夢     58624
11                    横山 由依     58610
12                    北原 里英     50190
13                  向井地 美音     47094
14                    岡田 奈々     43318
15                    高橋 朱里     40648
16 にゃんにゃん仮面(小嶋陽菜)     40071
17                  峯岸 みなみ     40011
18                    入山 杏奈     36894
19                    小嶋 真子     33524
20                    高柳 明音     33176
21                    込山 榛香     32886
22                    大場 美奈     32118
23                    朝長 美桜     31314
24                    白間 美瑠     29983
25                    沖田 彩華     29517
26                    加藤 玲奈     29333
27                    川本 紗矢     29213
28                    矢吹 奈子     28706
29                    古畑 奈和     28553
30                  惣田 紗莉渚     28369
31                    竹内 彩姫     28282
32                    大島 涼花     28260
33                    矢倉 楓子     27487
34                  倉野尾 成美     26152
35                    江籠 裕奈     25963
36                    本村 碧唯     25613
37                  木崎 ゆりあ     25039
38                佐々木 優佳里     24059
39                      薮下 柊     23251
40                      渕上 舞     22995
41                  藤江 れいな     22429
42                  冨吉 明日香     21881
43                    田島 芽瑠     21864
44                  須藤 凜々花     21559
45                    田中 美久     21009
46                    松岡 菜摘     20980
47                      茂木 忍     20913
48                  井上 由莉耶     20643
49                    二村 春香     20618
50                  森保 まどか     20228
51                    岩立 沙穂     20021
52                    太田 夢莉     19534
53                  神志那 結衣     19377
54                      竹内 舞     19326
55                    谷 真理佳     19274
56                    渋谷 凪咲     19140
57                    岡田 彩花     18524
58                    植木 南央     17898
59                    坂口 理子     16839
60                    駒田 京伽     16691
61                    西野 未姫     16548
62                  大和田 南那     16246
63                    酒井 萌衣     15994
64                    北川 綾巴     15793
65                    宮前 杏実     15716
66                    岸野 里香     15697
67                    熊崎 晴香     15600
68                    木本 花音     15057
69                    谷口 めぐ     14950
70                    坂口 渚沙     14913
71                    山内 鈴蘭     14550
72                    秋吉 優花     14544
73                    大森 美優     14177
74                    鎌田 菜月     13882
75                  佐藤 すみれ     13657
76                    加藤 美南     13571
77                    吉田 朱里     13512
78                    宮崎 美穂     13366
79                    日高 優月     13204
80                    村重 杏奈     13058

・データフレーム df.akb を csvファイルとして保存する。

write.csv(df.akb, "akb48.csv", 
          fileEncoding = "CP932")

2. 2017年総選挙データ(東京)のスクレイピング

2-1. Webからデータを読み込む

・ここでは朝日新聞2017総選挙のホームページ(http://www.asahi.com/senkyo/senkyo2017/) から、Rを使って、データを読み取ってみる。

・朝日新聞2017総選挙(東京選挙区)のホームページ上で「表示」→「開発/管理」→「ソースを表示」を選ぶ。

・2017年総選挙(東京 1 区)では海江田万里氏が最も多く得票し、小選挙区で当選している。
・上記表示に該当するソースを確認する。

・Rを使ってWebサイトに接続し、ソース情報を一括で取得し、tokyo と名前をつける。

tokyo = readLines("http://www.asahi.com/senkyo/senkyo2017/kaihyo/A13.html", encoding = "UTF-8")

・読み取ったデータ result の最初の8行を表示してみる。

head(tokyo)
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"                                                                                                                                                                     
[2] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"                                                                                  
[3] "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ja\" lang=\"ja\" dir=\"ltr\" xmlns:og=\"http://ogp.me/ns#\" xmlns:mixi=\"http://mixi-platform.com/ns#\" xmlns:fb=\"http://www.facebook.com/2008/fbml\">"
[4] "<head>"                                                                                                                                                                                                         
[5] "<!-- DTM上 -->"                                                                                                                                                                                                 
[6] "<script src=\"//assets.adobedtm.com/d7e679c95b1f3fceafd1fcdf47a9b3bc7a11d039/satelliteLib-b5f070ddaa8837c4b9c5d3e0509562a889b01b07.js\"></script>"                                                              

2-2. 必要な情報だけ(名前)を選別する

・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者氏名は sei の後ろにあることがわかる。

・sei の後ろにある名前情報のひとかたまりを集め、sei と名付ける。

sei <- grep("\"sei\"", tokyo)
tokyo[sei]
 [1] "<td class=\"namae\"><div><span class=\"sei\">海江田</span><span class=\"mei\">万里</span><span class=\"age\">(68)</span></div></td>"
 [2] "<td class=\"namae\"><div><span class=\"sei\">山田</span><span class=\"mei\">美樹</span><span class=\"age\">(43)</span></div></td>"  
 [3] "<td class=\"namae\"><div><span class=\"sei\">松沢</span><span class=\"mei\">香</span><span class=\"age\">(39)</span></div></td>"    
 [4] "<td class=\"namae\"><div><span class=\"sei\">原口</span><span class=\"mei\">実季</span><span class=\"age\">(28)</span></div></td>"  
 [5] "<td class=\"namae\"><div><span class=\"sei\">犬丸</span><span class=\"mei\">光加</span><span class=\"age\">(57)</span></div></td>"  
 [6] "<td class=\"namae\"><div><span class=\"sei\">又吉</span><span class=\"mei\">光雄</span><span class=\"age\">(73)</span></div></td>"  
 [7] "<td class=\"namae\"><div><span class=\"sei\">辻</span><span class=\"mei\">清人</span><span class=\"age\">(38)</span></div></td>"    
 [8] "<td class=\"namae\"><div><span class=\"sei\">松尾</span><span class=\"mei\">明弘</span><span class=\"age\">(42)</span></div></td>"  
 [9] "<td class=\"namae\"><div><span class=\"sei\">鳩山</span><span class=\"mei\">太郎</span><span class=\"age\">(43)</span></div></td>"  
[10] "<td class=\"namae\"><div><span class=\"sei\">石原</span><span class=\"mei\">宏高</span><span class=\"age\">(53)</span></div></td>"  
[11] "<td class=\"namae\"><div><span class=\"sei\">松原</span><span class=\"mei\">仁</span><span class=\"age\">(61)</span></div></td>"    
[12] "<td class=\"namae\"><div><span class=\"sei\">香西</span><span class=\"mei\">克介</span><span class=\"age\">(41)</span></div></td>"  
[13] "<td class=\"namae\"><div><span class=\"sei\">平</span><span class=\"mei\">将明</span><span class=\"age\">(50)</span></div></td>"    
[14] "<td class=\"namae\"><div><span class=\"sei\">井戸</span><span class=\"mei\">正枝</span><span class=\"age\">(51)</span></div></td>"  
[15] "<td class=\"namae\"><div><span class=\"sei\">難波</span><span class=\"mei\">美智代</span><span class=\"age\">(43)</span></div></td>"
[16] "<td class=\"namae\"><div><span class=\"sei\">青山</span><span class=\"mei\">昂平</span><span class=\"age\">(26)</span></div></td>"  
[17] "<td class=\"namae\"><div><span class=\"sei\">若宮</span><span class=\"mei\">健嗣</span><span class=\"age\">(56)</span></div></td>"  
[18] "<td class=\"namae\"><div><span class=\"sei\">手塚</span><span class=\"mei\">仁雄</span><span class=\"age\">(51)</span></div></td>"  
[19] "<td class=\"namae\"><div><span class=\"sei\">福田</span><span class=\"mei\">峰之</span><span class=\"age\">(53)</span></div></td>"  
[20] "<td class=\"namae\"><div><span class=\"sei\">落合</span><span class=\"mei\">貴之</span><span class=\"age\">(38)</span></div></td>"  
[21] "<td class=\"namae\"><div><span class=\"sei\">越智</span><span class=\"mei\">隆雄</span><span class=\"age\">(53)</span></div></td>"  
[22] "<td class=\"namae\"><div><span class=\"sei\">植松</span><span class=\"mei\">恵美子</span><span class=\"age\">(49)</span></div></td>"
[23] "<td class=\"namae\"><div><span class=\"sei\">中岡</span><span class=\"mei\">茉妃</span><span class=\"age\">(26)</span></div></td>"  
[24] "<td class=\"namae\"><div><span class=\"sei\">長妻</span><span class=\"mei\">昭</span><span class=\"age\">(57)</span></div></td>"    
[25] "<td class=\"namae\"><div><span class=\"sei\">松本</span><span class=\"mei\">文明</span><span class=\"age\">(68)</span></div></td>"  
[26] "<td class=\"namae\"><div><span class=\"sei\">荒木</span><span class=\"mei\">章博</span><span class=\"age\">(64)</span></div></td>"  
[27] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">郁磨</span><span class=\"age\">(26)</span></div></td>"  
[28] "<td class=\"namae\"><div><span class=\"sei\">石原</span><span class=\"mei\">伸晃</span><span class=\"age\">(60)</span></div></td>"  
[29] "<td class=\"namae\"><div><span class=\"sei\">吉田</span><span class=\"mei\">晴美</span><span class=\"age\">(45)</span></div></td>"  
[30] "<td class=\"namae\"><div><span class=\"sei\">木内</span><span class=\"mei\">孝胤</span><span class=\"age\">(51)</span></div></td>"  
[31] "<td class=\"namae\"><div><span class=\"sei\">長内</span><span class=\"mei\">史子</span><span class=\"age\">(29)</span></div></td>"  
[32] "<td class=\"namae\"><div><span class=\"sei\">円</span><span class=\"mei\">より子</span><span class=\"age\">(70)</span></div></td>"  
[33] "<td class=\"namae\"><div><span class=\"sei\">斎藤</span><span class=\"mei\">郁真</span><span class=\"age\">(29)</span></div></td>"  
[34] "<td class=\"namae\"><div><span class=\"sei\">菅原</span><span class=\"mei\">一秀</span><span class=\"age\">(55)</span></div></td>"  
[35] "<td class=\"namae\"><div><span class=\"sei\">高松</span><span class=\"mei\">智之</span><span class=\"age\">(43)</span></div></td>"  
[36] "<td class=\"namae\"><div><span class=\"sei\">原</span><span class=\"mei\">純子</span><span class=\"age\">(53)</span></div></td>"    
[37] "<td class=\"namae\"><div><span class=\"sei\">前田</span><span class=\"mei\">吉成</span><span class=\"age\">(62)</span></div></td>"  
[38] "<td class=\"namae\"><div><span class=\"sei\">鈴木</span><span class=\"mei\">隼人</span><span class=\"age\">(40)</span></div></td>"  
[39] "<td class=\"namae\"><div><span class=\"sei\">鈴木</span><span class=\"mei\">庸介</span><span class=\"age\">(41)</span></div></td>"  
[40] "<td class=\"namae\"><div><span class=\"sei\">若狭</span><span class=\"mei\">勝</span><span class=\"age\">(60)</span></div></td>"    
[41] "<td class=\"namae\"><div><span class=\"sei\">岸</span><span class=\"mei\">良信</span><span class=\"age\">(62)</span></div></td>"    
[42] "<td class=\"namae\"><div><span class=\"sei\">小山</span><span class=\"mei\">徹</span><span class=\"age\">(42)</span></div></td>"    
[43] "<td class=\"namae\"><div><span class=\"sei\">吉井</span><span class=\"mei\">利光</span><span class=\"age\">(35)</span></div></td>"  
[44] "<td class=\"namae\"><div><span class=\"sei\">下村</span><span class=\"mei\">博文</span><span class=\"age\">(63)</span></div></td>"  
[45] "<td class=\"namae\"><div><span class=\"sei\">前田</span><span class=\"mei\">順一郎</span><span class=\"age\">(42)</span></div></td>"
[46] "<td class=\"namae\"><div><span class=\"sei\">宍戸</span><span class=\"mei\">千絵</span><span class=\"age\">(39)</span></div></td>"  
[47] "<td class=\"namae\"><div><span class=\"sei\">小堤</span><span class=\"mei\">東</span><span class=\"age\">(28)</span></div></td>"    
[48] "<td class=\"namae\"><div><span class=\"sei\">太田</span><span class=\"mei\">昭宏</span><span class=\"age\">(72)</span></div></td>"  
[49] "<td class=\"namae\"><div><span class=\"sei\">池内</span><span class=\"mei\">沙織</span><span class=\"age\">(35)</span></div></td>"  
[50] "<td class=\"namae\"><div><span class=\"sei\">中村</span><span class=\"mei\">勝</span><span class=\"age\">(66)</span></div></td>"    
[51] "<td class=\"namae\"><div><span class=\"sei\">鴨下</span><span class=\"mei\">一郎</span><span class=\"age\">(68)</span></div></td>"  
[52] "<td class=\"namae\"><div><span class=\"sei\">北條</span><span class=\"mei\">智彦</span><span class=\"age\">(34)</span></div></td>"  
[53] "<td class=\"namae\"><div><span class=\"sei\">祖父江</span><span class=\"mei\">元希</span><span class=\"age\">(42)</span></div></td>"
[54] "<td class=\"namae\"><div><span class=\"sei\">松島</span><span class=\"mei\">みどり</span><span class=\"age\">(61)</span></div></td>"
[55] "<td class=\"namae\"><div><span class=\"sei\">矢作</span><span class=\"mei\">麻子</span><span class=\"age\">(39)</span></div></td>"  
[56] "<td class=\"namae\"><div><span class=\"sei\">阿藤</span><span class=\"mei\">和之</span><span class=\"age\">(46)</span></div></td>"  
[57] "<td class=\"namae\"><div><span class=\"sei\">清井</span><span class=\"mei\">美穂</span><span class=\"age\">(54)</span></div></td>"  
[58] "<td class=\"namae\"><div><span class=\"sei\">大塚</span><span class=\"mei\">紀久雄</span><span class=\"age\">(76)</span></div></td>"
[59] "<td class=\"namae\"><div><span class=\"sei\">秋元</span><span class=\"mei\">司</span><span class=\"age\">(46)</span></div></td>"    
[60] "<td class=\"namae\"><div><span class=\"sei\">柿沢</span><span class=\"mei\">未途</span><span class=\"age\">(46)</span></div></td>"  
[61] "<td class=\"namae\"><div><span class=\"sei\">吉田</span><span class=\"mei\">年男</span><span class=\"age\">(69)</span></div></td>"  
[62] "<td class=\"namae\"><div><span class=\"sei\">猪野</span><span class=\"mei\">隆</span><span class=\"age\">(52)</span></div></td>"    
[63] "<td class=\"namae\"><div><span class=\"sei\">大西</span><span class=\"mei\">英男</span><span class=\"age\">(71)</span></div></td>"  
[64] "<td class=\"namae\"><div><span class=\"sei\">初鹿</span><span class=\"mei\">明博</span><span class=\"age\">(48)</span></div></td>"  
[65] "<td class=\"namae\"><div><span class=\"sei\">田村</span><span class=\"mei\">謙治</span><span class=\"age\">(49)</span></div></td>"  
[66] "<td class=\"namae\"><div><span class=\"sei\">平沢</span><span class=\"mei\">勝栄</span><span class=\"age\">(72)</span></div></td>"  
[67] "<td class=\"namae\"><div><span class=\"sei\">西田</span><span class=\"mei\">主税</span><span class=\"age\">(55)</span></div></td>"  
[68] "<td class=\"namae\"><div><span class=\"sei\">新井</span><span class=\"mei\">杉生</span><span class=\"age\">(58)</span></div></td>"  
[69] "<td class=\"namae\"><div><span class=\"sei\">菅</span><span class=\"mei\">直人</span><span class=\"age\">(71)</span></div></td>"    
[70] "<td class=\"namae\"><div><span class=\"sei\">土屋</span><span class=\"mei\">正忠</span><span class=\"age\">(75)</span></div></td>"  
[71] "<td class=\"namae\"><div><span class=\"sei\">鴇田</span><span class=\"mei\">敦</span><span class=\"age\">(51)</span></div></td>"    
[72] "<td class=\"namae\"><div><span class=\"sei\">松本</span><span class=\"mei\">洋平</span><span class=\"age\">(44)</span></div></td>"  
[73] "<td class=\"namae\"><div><span class=\"sei\">末松</span><span class=\"mei\">義規</span><span class=\"age\">(60)</span></div></td>"  
[74] "<td class=\"namae\"><div><span class=\"sei\">佐々木</span><span class=\"mei\">里加</span><span class=\"age\">(50)</span></div></td>"
[75] "<td class=\"namae\"><div><span class=\"sei\">杉下</span><span class=\"mei\">茂雄</span><span class=\"age\">(68)</span></div></td>"  
[76] "<td class=\"namae\"><div><span class=\"sei\">木原</span><span class=\"mei\">誠二</span><span class=\"age\">(47)</span></div></td>"  
[77] "<td class=\"namae\"><div><span class=\"sei\">宮本</span><span class=\"mei\">徹</span><span class=\"age\">(45)</span></div></td>"    
[78] "<td class=\"namae\"><div><span class=\"sei\">鹿野</span><span class=\"mei\">晃</span><span class=\"age\">(44)</span></div></td>"    
[79] "<td class=\"namae\"><div><span class=\"sei\">長島</span><span class=\"mei\">昭久</span><span class=\"age\">(55)</span></div></td>"  
[80] "<td class=\"namae\"><div><span class=\"sei\">小田原</span><span class=\"mei\">潔</span><span class=\"age\">(53)</span></div></td>"  
[81] "<td class=\"namae\"><div><span class=\"sei\">小糸</span><span class=\"mei\">健介</span><span class=\"age\">(35)</span></div></td>"  
[82] "<td class=\"namae\"><div><span class=\"sei\">天木</span><span class=\"mei\">直人</span><span class=\"age\">(70)</span></div></td>"  
[83] "<td class=\"namae\"><div><span class=\"sei\">伊藤</span><span class=\"mei\">達也</span><span class=\"age\">(56)</span></div></td>"  
[84] "<td class=\"namae\"><div><span class=\"sei\">山花</span><span class=\"mei\">郁夫</span><span class=\"age\">(50)</span></div></td>"  
[85] "<td class=\"namae\"><div><span class=\"sei\">金ケ崎</span><span class=\"mei\">絵美</span><span class=\"age\">(41)</span></div></td>"
[86] "<td class=\"namae\"><div><span class=\"sei\">阿部</span><span class=\"mei\">真</span><span class=\"age\">(43)</span></div></td>"    
[87] "<td class=\"namae\"><div><span class=\"sei\">小倉</span><span class=\"mei\">将信</span><span class=\"age\">(36)</span></div></td>"  
[88] "<td class=\"namae\"><div><span class=\"sei\">伊藤</span><span class=\"mei\">俊輔</span><span class=\"age\">(38)</span></div></td>"  
[89] "<td class=\"namae\"><div><span class=\"sei\">松村</span><span class=\"mei\">亮佑</span><span class=\"age\">(37)</span></div></td>"  
[90] "<td class=\"namae\"><div><span class=\"sei\">萩生田</span><span class=\"mei\">光一</span><span class=\"age\">(54)</span></div></td>"
[91] "<td class=\"namae\"><div><span class=\"sei\">高橋</span><span class=\"mei\">斉久</span><span class=\"age\">(44)</span></div></td>"  
[92] "<td class=\"namae\"><div><span class=\"sei\">吉羽</span><span class=\"mei\">美華</span><span class=\"age\">(37)</span></div></td>"  
[93] "<td class=\"namae\"><div><span class=\"sei\">飯田</span><span class=\"mei\">美弥子</span><span class=\"age\">(57)</span></div></td>"
[94] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">信治</span><span class=\"age\">(48)</span></div></td>"  
[95] "<td class=\"namae\"><div><span class=\"sei\">山下</span><span class=\"mei\">容子</span><span class=\"age\">(58)</span></div></td>"  
[96] "<td class=\"namae\"><div><span class=\"sei\">小沢</span><span class=\"mei\">鋭仁</span><span class=\"age\">(63)</span></div></td>"  
[97] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">宣</span><span class=\"age\">(43)</span></div></td>"    

gsub関数を使って、名前以外の文字を消す。

tokyo_name <- gsub("<td class=\"namae\"><div><span class=\"sei\">(.*)</span><span class=\"mei\">(.*)</span><span class=\"age\">(.*)</span></div></td>","\\1, \\2, \\3, \\4", tokyo[sei])
tokyo_name
 [1] "海江田, 万里, (68), " "山田, 美樹, (43), "   "松沢, 香, (39), "    
 [4] "原口, 実季, (28), "   "犬丸, 光加, (57), "   "又吉, 光雄, (73), "  
 [7] "辻, 清人, (38), "     "松尾, 明弘, (42), "   "鳩山, 太郎, (43), "  
[10] "石原, 宏高, (53), "   "松原, 仁, (61), "     "香西, 克介, (41), "  
[13] "平, 将明, (50), "     "井戸, 正枝, (51), "   "難波, 美智代, (43), "
[16] "青山, 昂平, (26), "   "若宮, 健嗣, (56), "   "手塚, 仁雄, (51), "  
[19] "福田, 峰之, (53), "   "落合, 貴之, (38), "   "越智, 隆雄, (53), "  
[22] "植松, 恵美子, (49), " "中岡, 茉妃, (26), "   "長妻, 昭, (57), "    
[25] "松本, 文明, (68), "   "荒木, 章博, (64), "   "井上, 郁磨, (26), "  
[28] "石原, 伸晃, (60), "   "吉田, 晴美, (45), "   "木内, 孝胤, (51), "  
[31] "長内, 史子, (29), "   "円, より子, (70), "   "斎藤, 郁真, (29), "  
[34] "菅原, 一秀, (55), "   "高松, 智之, (43), "   "原, 純子, (53), "    
[37] "前田, 吉成, (62), "   "鈴木, 隼人, (40), "   "鈴木, 庸介, (41), "  
[40] "若狭, 勝, (60), "     "岸, 良信, (62), "     "小山, 徹, (42), "    
[43] "吉井, 利光, (35), "   "下村, 博文, (63), "   "前田, 順一郎, (42), "
[46] "宍戸, 千絵, (39), "   "小堤, 東, (28), "     "太田, 昭宏, (72), "  
[49] "池内, 沙織, (35), "   "中村, 勝, (66), "     "鴨下, 一郎, (68), "  
[52] "北條, 智彦, (34), "   "祖父江, 元希, (42), " "松島, みどり, (61), "
[55] "矢作, 麻子, (39), "   "阿藤, 和之, (46), "   "清井, 美穂, (54), "  
[58] "大塚, 紀久雄, (76), " "秋元, 司, (46), "     "柿沢, 未途, (46), "  
[61] "吉田, 年男, (69), "   "猪野, 隆, (52), "     "大西, 英男, (71), "  
[64] "初鹿, 明博, (48), "   "田村, 謙治, (49), "   "平沢, 勝栄, (72), "  
[67] "西田, 主税, (55), "   "新井, 杉生, (58), "   "菅, 直人, (71), "    
[70] "土屋, 正忠, (75), "   "鴇田, 敦, (51), "     "松本, 洋平, (44), "  
[73] "末松, 義規, (60), "   "佐々木, 里加, (50), " "杉下, 茂雄, (68), "  
[76] "木原, 誠二, (47), "   "宮本, 徹, (45), "     "鹿野, 晃, (44), "    
[79] "長島, 昭久, (55), "   "小田原, 潔, (53), "   "小糸, 健介, (35), "  
[82] "天木, 直人, (70), "   "伊藤, 達也, (56), "   "山花, 郁夫, (50), "  
[85] "金ケ崎, 絵美, (41), " "阿部, 真, (43), "     "小倉, 将信, (36), "  
[88] "伊藤, 俊輔, (38), "   "松村, 亮佑, (37), "   "萩生田, 光一, (54), "
[91] "高橋, 斉久, (44), "   "吉羽, 美華, (37), "   "飯田, 美弥子, (57), "
[94] "井上, 信治, (48), "   "山下, 容子, (58), "   "小沢, 鋭仁, (63), "  
[97] "井上, 宣, (43), "    

・上記の情報には名前以外の余計な情報(カッコやカンマなど)が残る。
 → 下記の作業を行うことで、余計な文字列を置換して消す。
gsub関数を使って、カッコ()を消す

tokyo_name <- gsub("[()]","",tokyo_name)
tokyo_name
 [1] "海江田, 万里, 68, " "山田, 美樹, 43, "   "松沢, 香, 39, "    
 [4] "原口, 実季, 28, "   "犬丸, 光加, 57, "   "又吉, 光雄, 73, "  
 [7] "辻, 清人, 38, "     "松尾, 明弘, 42, "   "鳩山, 太郎, 43, "  
[10] "石原, 宏高, 53, "   "松原, 仁, 61, "     "香西, 克介, 41, "  
[13] "平, 将明, 50, "     "井戸, 正枝, 51, "   "難波, 美智代, 43, "
[16] "青山, 昂平, 26, "   "若宮, 健嗣, 56, "   "手塚, 仁雄, 51, "  
[19] "福田, 峰之, 53, "   "落合, 貴之, 38, "   "越智, 隆雄, 53, "  
[22] "植松, 恵美子, 49, " "中岡, 茉妃, 26, "   "長妻, 昭, 57, "    
[25] "松本, 文明, 68, "   "荒木, 章博, 64, "   "井上, 郁磨, 26, "  
[28] "石原, 伸晃, 60, "   "吉田, 晴美, 45, "   "木内, 孝胤, 51, "  
[31] "長内, 史子, 29, "   "円, より子, 70, "   "斎藤, 郁真, 29, "  
[34] "菅原, 一秀, 55, "   "高松, 智之, 43, "   "原, 純子, 53, "    
[37] "前田, 吉成, 62, "   "鈴木, 隼人, 40, "   "鈴木, 庸介, 41, "  
[40] "若狭, 勝, 60, "     "岸, 良信, 62, "     "小山, 徹, 42, "    
[43] "吉井, 利光, 35, "   "下村, 博文, 63, "   "前田, 順一郎, 42, "
[46] "宍戸, 千絵, 39, "   "小堤, 東, 28, "     "太田, 昭宏, 72, "  
[49] "池内, 沙織, 35, "   "中村, 勝, 66, "     "鴨下, 一郎, 68, "  
[52] "北條, 智彦, 34, "   "祖父江, 元希, 42, " "松島, みどり, 61, "
[55] "矢作, 麻子, 39, "   "阿藤, 和之, 46, "   "清井, 美穂, 54, "  
[58] "大塚, 紀久雄, 76, " "秋元, 司, 46, "     "柿沢, 未途, 46, "  
[61] "吉田, 年男, 69, "   "猪野, 隆, 52, "     "大西, 英男, 71, "  
[64] "初鹿, 明博, 48, "   "田村, 謙治, 49, "   "平沢, 勝栄, 72, "  
[67] "西田, 主税, 55, "   "新井, 杉生, 58, "   "菅, 直人, 71, "    
[70] "土屋, 正忠, 75, "   "鴇田, 敦, 51, "     "松本, 洋平, 44, "  
[73] "末松, 義規, 60, "   "佐々木, 里加, 50, " "杉下, 茂雄, 68, "  
[76] "木原, 誠二, 47, "   "宮本, 徹, 45, "     "鹿野, 晃, 44, "    
[79] "長島, 昭久, 55, "   "小田原, 潔, 53, "   "小糸, 健介, 35, "  
[82] "天木, 直人, 70, "   "伊藤, 達也, 56, "   "山花, 郁夫, 50, "  
[85] "金ケ崎, 絵美, 41, " "阿部, 真, 43, "     "小倉, 将信, 36, "  
[88] "伊藤, 俊輔, 38, "   "松村, 亮佑, 37, "   "萩生田, 光一, 54, "
[91] "高橋, 斉久, 44, "   "吉羽, 美華, 37, "   "飯田, 美弥子, 57, "
[94] "井上, 信治, 48, "   "山下, 容子, 58, "   "小沢, 鋭仁, 63, "  
[97] "井上, 宣, 43, "    

gsub関数を使って、カンマを消す。

tokyo_name <- gsub(",","",tokyo_name)
tokyo_name
 [1] "海江田 万里 68 " "山田 美樹 43 "   "松沢 香 39 "    
 [4] "原口 実季 28 "   "犬丸 光加 57 "   "又吉 光雄 73 "  
 [7] "辻 清人 38 "     "松尾 明弘 42 "   "鳩山 太郎 43 "  
[10] "石原 宏高 53 "   "松原 仁 61 "     "香西 克介 41 "  
[13] "平 将明 50 "     "井戸 正枝 51 "   "難波 美智代 43 "
[16] "青山 昂平 26 "   "若宮 健嗣 56 "   "手塚 仁雄 51 "  
[19] "福田 峰之 53 "   "落合 貴之 38 "   "越智 隆雄 53 "  
[22] "植松 恵美子 49 " "中岡 茉妃 26 "   "長妻 昭 57 "    
[25] "松本 文明 68 "   "荒木 章博 64 "   "井上 郁磨 26 "  
[28] "石原 伸晃 60 "   "吉田 晴美 45 "   "木内 孝胤 51 "  
[31] "長内 史子 29 "   "円 より子 70 "   "斎藤 郁真 29 "  
[34] "菅原 一秀 55 "   "高松 智之 43 "   "原 純子 53 "    
[37] "前田 吉成 62 "   "鈴木 隼人 40 "   "鈴木 庸介 41 "  
[40] "若狭 勝 60 "     "岸 良信 62 "     "小山 徹 42 "    
[43] "吉井 利光 35 "   "下村 博文 63 "   "前田 順一郎 42 "
[46] "宍戸 千絵 39 "   "小堤 東 28 "     "太田 昭宏 72 "  
[49] "池内 沙織 35 "   "中村 勝 66 "     "鴨下 一郎 68 "  
[52] "北條 智彦 34 "   "祖父江 元希 42 " "松島 みどり 61 "
[55] "矢作 麻子 39 "   "阿藤 和之 46 "   "清井 美穂 54 "  
[58] "大塚 紀久雄 76 " "秋元 司 46 "     "柿沢 未途 46 "  
[61] "吉田 年男 69 "   "猪野 隆 52 "     "大西 英男 71 "  
[64] "初鹿 明博 48 "   "田村 謙治 49 "   "平沢 勝栄 72 "  
[67] "西田 主税 55 "   "新井 杉生 58 "   "菅 直人 71 "    
[70] "土屋 正忠 75 "   "鴇田 敦 51 "     "松本 洋平 44 "  
[73] "末松 義規 60 "   "佐々木 里加 50 " "杉下 茂雄 68 "  
[76] "木原 誠二 47 "   "宮本 徹 45 "     "鹿野 晃 44 "    
[79] "長島 昭久 55 "   "小田原 潔 53 "   "小糸 健介 35 "  
[82] "天木 直人 70 "   "伊藤 達也 56 "   "山花 郁夫 50 "  
[85] "金ケ崎 絵美 41 " "阿部 真 43 "     "小倉 将信 36 "  
[88] "伊藤 俊輔 38 "   "松村 亮佑 37 "   "萩生田 光一 54 "
[91] "高橋 斉久 44 "   "吉羽 美華 37 "   "飯田 美弥子 57 "
[94] "井上 信治 48 "   "山下 容子 58 "   "小沢 鋭仁 63 "  
[97] "井上 宣 43 "    

gsub関数を使って、名前の後ろに残ってしまった、年齢の数字を置換して消す。

tokyo_name <- gsub("[0-99]+","",tokyo_name)
tokyo_name
 [1] "海江田 万里  " "山田 美樹  "   "松沢 香  "     "原口 実季  "  
 [5] "犬丸 光加  "   "又吉 光雄  "   "辻 清人  "     "松尾 明弘  "  
 [9] "鳩山 太郎  "   "石原 宏高  "   "松原 仁  "     "香西 克介  "  
[13] "平 将明  "     "井戸 正枝  "   "難波 美智代  " "青山 昂平  "  
[17] "若宮 健嗣  "   "手塚 仁雄  "   "福田 峰之  "   "落合 貴之  "  
[21] "越智 隆雄  "   "植松 恵美子  " "中岡 茉妃  "   "長妻 昭  "    
[25] "松本 文明  "   "荒木 章博  "   "井上 郁磨  "   "石原 伸晃  "  
[29] "吉田 晴美  "   "木内 孝胤  "   "長内 史子  "   "円 より子  "  
[33] "斎藤 郁真  "   "菅原 一秀  "   "高松 智之  "   "原 純子  "    
[37] "前田 吉成  "   "鈴木 隼人  "   "鈴木 庸介  "   "若狭 勝  "    
[41] "岸 良信  "     "小山 徹  "     "吉井 利光  "   "下村 博文  "  
[45] "前田 順一郎  " "宍戸 千絵  "   "小堤 東  "     "太田 昭宏  "  
[49] "池内 沙織  "   "中村 勝  "     "鴨下 一郎  "   "北條 智彦  "  
[53] "祖父江 元希  " "松島 みどり  " "矢作 麻子  "   "阿藤 和之  "  
[57] "清井 美穂  "   "大塚 紀久雄  " "秋元 司  "     "柿沢 未途  "  
[61] "吉田 年男  "   "猪野 隆  "     "大西 英男  "   "初鹿 明博  "  
[65] "田村 謙治  "   "平沢 勝栄  "   "西田 主税  "   "新井 杉生  "  
[69] "菅 直人  "     "土屋 正忠  "   "鴇田 敦  "     "松本 洋平  "  
[73] "末松 義規  "   "佐々木 里加  " "杉下 茂雄  "   "木原 誠二  "  
[77] "宮本 徹  "     "鹿野 晃  "     "長島 昭久  "   "小田原 潔  "  
[81] "小糸 健介  "   "天木 直人  "   "伊藤 達也  "   "山花 郁夫  "  
[85] "金ケ崎 絵美  " "阿部 真  "     "小倉 将信  "   "伊藤 俊輔  "  
[89] "松村 亮佑  "   "萩生田 光一  " "高橋 斉久  "   "吉羽 美華  "  
[93] "飯田 美弥子  " "井上 信治  "   "山下 容子  "   "小沢 鋭仁  "  
[97] "井上 宣  "    

gsub関数を使って、名前と数字の間の半角スペース 2 スペース分を置換して消す。

tokyo_name <- gsub("  ","",tokyo_name)
tokyo_name
##  [1] "海江田 万里" "山田 美樹"   "松沢 香"     "原口 実季"   "犬丸 光加"  
##  [6] "又吉 光雄"   "辻 清人"     "松尾 明弘"   "鳩山 太郎"   "石原 宏高"  
## [11] "松原 仁"     "香西 克介"   "平 将明"     "井戸 正枝"   "難波 美智代"
## [16] "青山 昂平"   "若宮 健嗣"   "手塚 仁雄"   "福田 峰之"   "落合 貴之"  
## [21] "越智 隆雄"   "植松 恵美子" "中岡 茉妃"   "長妻 昭"     "松本 文明"  
## [26] "荒木 章博"   "井上 郁磨"   "石原 伸晃"   "吉田 晴美"   "木内 孝胤"  
## [31] "長内 史子"   "円 より子"   "斎藤 郁真"   "菅原 一秀"   "高松 智之"  
## [36] "原 純子"     "前田 吉成"   "鈴木 隼人"   "鈴木 庸介"   "若狭 勝"    
## [41] "岸 良信"     "小山 徹"     "吉井 利光"   "下村 博文"   "前田 順一郎"
## [46] "宍戸 千絵"   "小堤 東"     "太田 昭宏"   "池内 沙織"   "中村 勝"    
## [51] "鴨下 一郎"   "北條 智彦"   "祖父江 元希" "松島 みどり" "矢作 麻子"  
## [56] "阿藤 和之"   "清井 美穂"   "大塚 紀久雄" "秋元 司"     "柿沢 未途"  
## [61] "吉田 年男"   "猪野 隆"     "大西 英男"   "初鹿 明博"   "田村 謙治"  
## [66] "平沢 勝栄"   "西田 主税"   "新井 杉生"   "菅 直人"     "土屋 正忠"  
## [71] "鴇田 敦"     "松本 洋平"   "末松 義規"   "佐々木 里加" "杉下 茂雄"  
## [76] "木原 誠二"   "宮本 徹"     "鹿野 晃"     "長島 昭久"   "小田原 潔"  
## [81] "小糸 健介"   "天木 直人"   "伊藤 達也"   "山花 郁夫"   "金ケ崎 絵美"
## [86] "阿部 真"     "小倉 将信"   "伊藤 俊輔"   "松村 亮佑"   "萩生田 光一"
## [91] "高橋 斉久"   "吉羽 美華"   "飯田 美弥子" "井上 信治"   "山下 容子"  
## [96] "小沢 鋭仁"   "井上 宣"

gsub関数を使って、氏と名の間の半角スペースを置換して消す。

tokyo_name <- gsub(" ","",tokyo_name)
tokyo_name
##  [1] "海江田万里" "山田美樹"   "松沢香"     "原口実季"   "犬丸光加"  
##  [6] "又吉光雄"   "辻清人"     "松尾明弘"   "鳩山太郎"   "石原宏高"  
## [11] "松原仁"     "香西克介"   "平将明"     "井戸正枝"   "難波美智代"
## [16] "青山昂平"   "若宮健嗣"   "手塚仁雄"   "福田峰之"   "落合貴之"  
## [21] "越智隆雄"   "植松恵美子" "中岡茉妃"   "長妻昭"     "松本文明"  
## [26] "荒木章博"   "井上郁磨"   "石原伸晃"   "吉田晴美"   "木内孝胤"  
## [31] "長内史子"   "円より子"   "斎藤郁真"   "菅原一秀"   "高松智之"  
## [36] "原純子"     "前田吉成"   "鈴木隼人"   "鈴木庸介"   "若狭勝"    
## [41] "岸良信"     "小山徹"     "吉井利光"   "下村博文"   "前田順一郎"
## [46] "宍戸千絵"   "小堤東"     "太田昭宏"   "池内沙織"   "中村勝"    
## [51] "鴨下一郎"   "北條智彦"   "祖父江元希" "松島みどり" "矢作麻子"  
## [56] "阿藤和之"   "清井美穂"   "大塚紀久雄" "秋元司"     "柿沢未途"  
## [61] "吉田年男"   "猪野隆"     "大西英男"   "初鹿明博"   "田村謙治"  
## [66] "平沢勝栄"   "西田主税"   "新井杉生"   "菅直人"     "土屋正忠"  
## [71] "鴇田敦"     "松本洋平"   "末松義規"   "佐々木里加" "杉下茂雄"  
## [76] "木原誠二"   "宮本徹"     "鹿野晃"     "長島昭久"   "小田原潔"  
## [81] "小糸健介"   "天木直人"   "伊藤達也"   "山花郁夫"   "金ケ崎絵美"
## [86] "阿部真"     "小倉将信"   "伊藤俊輔"   "松村亮佑"   "萩生田光一"
## [91] "高橋斉久"   "吉羽美華"   "飯田美弥子" "井上信治"   "山下容子"  
## [96] "小沢鋭仁"   "井上宣"

・これで、氏名の情報を整えることができた。

2-3. 必要な情報(年齢)だけを選別する

・年齢のデータを集める。    ・年齢の数字は、候補者の氏名の後ろにあることがわかっている。
・先ほど同様、“"sei"”近辺のソースを集め、ageと名付ける。

age <- grep("\"sei\"", tokyo)
tokyo[age]
 [1] "<td class=\"namae\"><div><span class=\"sei\">海江田</span><span class=\"mei\">万里</span><span class=\"age\">(68)</span></div></td>"
 [2] "<td class=\"namae\"><div><span class=\"sei\">山田</span><span class=\"mei\">美樹</span><span class=\"age\">(43)</span></div></td>"  
 [3] "<td class=\"namae\"><div><span class=\"sei\">松沢</span><span class=\"mei\">香</span><span class=\"age\">(39)</span></div></td>"    
 [4] "<td class=\"namae\"><div><span class=\"sei\">原口</span><span class=\"mei\">実季</span><span class=\"age\">(28)</span></div></td>"  
 [5] "<td class=\"namae\"><div><span class=\"sei\">犬丸</span><span class=\"mei\">光加</span><span class=\"age\">(57)</span></div></td>"  
 [6] "<td class=\"namae\"><div><span class=\"sei\">又吉</span><span class=\"mei\">光雄</span><span class=\"age\">(73)</span></div></td>"  
 [7] "<td class=\"namae\"><div><span class=\"sei\">辻</span><span class=\"mei\">清人</span><span class=\"age\">(38)</span></div></td>"    
 [8] "<td class=\"namae\"><div><span class=\"sei\">松尾</span><span class=\"mei\">明弘</span><span class=\"age\">(42)</span></div></td>"  
 [9] "<td class=\"namae\"><div><span class=\"sei\">鳩山</span><span class=\"mei\">太郎</span><span class=\"age\">(43)</span></div></td>"  
[10] "<td class=\"namae\"><div><span class=\"sei\">石原</span><span class=\"mei\">宏高</span><span class=\"age\">(53)</span></div></td>"  
[11] "<td class=\"namae\"><div><span class=\"sei\">松原</span><span class=\"mei\">仁</span><span class=\"age\">(61)</span></div></td>"    
[12] "<td class=\"namae\"><div><span class=\"sei\">香西</span><span class=\"mei\">克介</span><span class=\"age\">(41)</span></div></td>"  
[13] "<td class=\"namae\"><div><span class=\"sei\">平</span><span class=\"mei\">将明</span><span class=\"age\">(50)</span></div></td>"    
[14] "<td class=\"namae\"><div><span class=\"sei\">井戸</span><span class=\"mei\">正枝</span><span class=\"age\">(51)</span></div></td>"  
[15] "<td class=\"namae\"><div><span class=\"sei\">難波</span><span class=\"mei\">美智代</span><span class=\"age\">(43)</span></div></td>"
[16] "<td class=\"namae\"><div><span class=\"sei\">青山</span><span class=\"mei\">昂平</span><span class=\"age\">(26)</span></div></td>"  
[17] "<td class=\"namae\"><div><span class=\"sei\">若宮</span><span class=\"mei\">健嗣</span><span class=\"age\">(56)</span></div></td>"  
[18] "<td class=\"namae\"><div><span class=\"sei\">手塚</span><span class=\"mei\">仁雄</span><span class=\"age\">(51)</span></div></td>"  
[19] "<td class=\"namae\"><div><span class=\"sei\">福田</span><span class=\"mei\">峰之</span><span class=\"age\">(53)</span></div></td>"  
[20] "<td class=\"namae\"><div><span class=\"sei\">落合</span><span class=\"mei\">貴之</span><span class=\"age\">(38)</span></div></td>"  
[21] "<td class=\"namae\"><div><span class=\"sei\">越智</span><span class=\"mei\">隆雄</span><span class=\"age\">(53)</span></div></td>"  
[22] "<td class=\"namae\"><div><span class=\"sei\">植松</span><span class=\"mei\">恵美子</span><span class=\"age\">(49)</span></div></td>"
[23] "<td class=\"namae\"><div><span class=\"sei\">中岡</span><span class=\"mei\">茉妃</span><span class=\"age\">(26)</span></div></td>"  
[24] "<td class=\"namae\"><div><span class=\"sei\">長妻</span><span class=\"mei\">昭</span><span class=\"age\">(57)</span></div></td>"    
[25] "<td class=\"namae\"><div><span class=\"sei\">松本</span><span class=\"mei\">文明</span><span class=\"age\">(68)</span></div></td>"  
[26] "<td class=\"namae\"><div><span class=\"sei\">荒木</span><span class=\"mei\">章博</span><span class=\"age\">(64)</span></div></td>"  
[27] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">郁磨</span><span class=\"age\">(26)</span></div></td>"  
[28] "<td class=\"namae\"><div><span class=\"sei\">石原</span><span class=\"mei\">伸晃</span><span class=\"age\">(60)</span></div></td>"  
[29] "<td class=\"namae\"><div><span class=\"sei\">吉田</span><span class=\"mei\">晴美</span><span class=\"age\">(45)</span></div></td>"  
[30] "<td class=\"namae\"><div><span class=\"sei\">木内</span><span class=\"mei\">孝胤</span><span class=\"age\">(51)</span></div></td>"  
[31] "<td class=\"namae\"><div><span class=\"sei\">長内</span><span class=\"mei\">史子</span><span class=\"age\">(29)</span></div></td>"  
[32] "<td class=\"namae\"><div><span class=\"sei\">円</span><span class=\"mei\">より子</span><span class=\"age\">(70)</span></div></td>"  
[33] "<td class=\"namae\"><div><span class=\"sei\">斎藤</span><span class=\"mei\">郁真</span><span class=\"age\">(29)</span></div></td>"  
[34] "<td class=\"namae\"><div><span class=\"sei\">菅原</span><span class=\"mei\">一秀</span><span class=\"age\">(55)</span></div></td>"  
[35] "<td class=\"namae\"><div><span class=\"sei\">高松</span><span class=\"mei\">智之</span><span class=\"age\">(43)</span></div></td>"  
[36] "<td class=\"namae\"><div><span class=\"sei\">原</span><span class=\"mei\">純子</span><span class=\"age\">(53)</span></div></td>"    
[37] "<td class=\"namae\"><div><span class=\"sei\">前田</span><span class=\"mei\">吉成</span><span class=\"age\">(62)</span></div></td>"  
[38] "<td class=\"namae\"><div><span class=\"sei\">鈴木</span><span class=\"mei\">隼人</span><span class=\"age\">(40)</span></div></td>"  
[39] "<td class=\"namae\"><div><span class=\"sei\">鈴木</span><span class=\"mei\">庸介</span><span class=\"age\">(41)</span></div></td>"  
[40] "<td class=\"namae\"><div><span class=\"sei\">若狭</span><span class=\"mei\">勝</span><span class=\"age\">(60)</span></div></td>"    
[41] "<td class=\"namae\"><div><span class=\"sei\">岸</span><span class=\"mei\">良信</span><span class=\"age\">(62)</span></div></td>"    
[42] "<td class=\"namae\"><div><span class=\"sei\">小山</span><span class=\"mei\">徹</span><span class=\"age\">(42)</span></div></td>"    
[43] "<td class=\"namae\"><div><span class=\"sei\">吉井</span><span class=\"mei\">利光</span><span class=\"age\">(35)</span></div></td>"  
[44] "<td class=\"namae\"><div><span class=\"sei\">下村</span><span class=\"mei\">博文</span><span class=\"age\">(63)</span></div></td>"  
[45] "<td class=\"namae\"><div><span class=\"sei\">前田</span><span class=\"mei\">順一郎</span><span class=\"age\">(42)</span></div></td>"
[46] "<td class=\"namae\"><div><span class=\"sei\">宍戸</span><span class=\"mei\">千絵</span><span class=\"age\">(39)</span></div></td>"  
[47] "<td class=\"namae\"><div><span class=\"sei\">小堤</span><span class=\"mei\">東</span><span class=\"age\">(28)</span></div></td>"    
[48] "<td class=\"namae\"><div><span class=\"sei\">太田</span><span class=\"mei\">昭宏</span><span class=\"age\">(72)</span></div></td>"  
[49] "<td class=\"namae\"><div><span class=\"sei\">池内</span><span class=\"mei\">沙織</span><span class=\"age\">(35)</span></div></td>"  
[50] "<td class=\"namae\"><div><span class=\"sei\">中村</span><span class=\"mei\">勝</span><span class=\"age\">(66)</span></div></td>"    
[51] "<td class=\"namae\"><div><span class=\"sei\">鴨下</span><span class=\"mei\">一郎</span><span class=\"age\">(68)</span></div></td>"  
[52] "<td class=\"namae\"><div><span class=\"sei\">北條</span><span class=\"mei\">智彦</span><span class=\"age\">(34)</span></div></td>"  
[53] "<td class=\"namae\"><div><span class=\"sei\">祖父江</span><span class=\"mei\">元希</span><span class=\"age\">(42)</span></div></td>"
[54] "<td class=\"namae\"><div><span class=\"sei\">松島</span><span class=\"mei\">みどり</span><span class=\"age\">(61)</span></div></td>"
[55] "<td class=\"namae\"><div><span class=\"sei\">矢作</span><span class=\"mei\">麻子</span><span class=\"age\">(39)</span></div></td>"  
[56] "<td class=\"namae\"><div><span class=\"sei\">阿藤</span><span class=\"mei\">和之</span><span class=\"age\">(46)</span></div></td>"  
[57] "<td class=\"namae\"><div><span class=\"sei\">清井</span><span class=\"mei\">美穂</span><span class=\"age\">(54)</span></div></td>"  
[58] "<td class=\"namae\"><div><span class=\"sei\">大塚</span><span class=\"mei\">紀久雄</span><span class=\"age\">(76)</span></div></td>"
[59] "<td class=\"namae\"><div><span class=\"sei\">秋元</span><span class=\"mei\">司</span><span class=\"age\">(46)</span></div></td>"    
[60] "<td class=\"namae\"><div><span class=\"sei\">柿沢</span><span class=\"mei\">未途</span><span class=\"age\">(46)</span></div></td>"  
[61] "<td class=\"namae\"><div><span class=\"sei\">吉田</span><span class=\"mei\">年男</span><span class=\"age\">(69)</span></div></td>"  
[62] "<td class=\"namae\"><div><span class=\"sei\">猪野</span><span class=\"mei\">隆</span><span class=\"age\">(52)</span></div></td>"    
[63] "<td class=\"namae\"><div><span class=\"sei\">大西</span><span class=\"mei\">英男</span><span class=\"age\">(71)</span></div></td>"  
[64] "<td class=\"namae\"><div><span class=\"sei\">初鹿</span><span class=\"mei\">明博</span><span class=\"age\">(48)</span></div></td>"  
[65] "<td class=\"namae\"><div><span class=\"sei\">田村</span><span class=\"mei\">謙治</span><span class=\"age\">(49)</span></div></td>"  
[66] "<td class=\"namae\"><div><span class=\"sei\">平沢</span><span class=\"mei\">勝栄</span><span class=\"age\">(72)</span></div></td>"  
[67] "<td class=\"namae\"><div><span class=\"sei\">西田</span><span class=\"mei\">主税</span><span class=\"age\">(55)</span></div></td>"  
[68] "<td class=\"namae\"><div><span class=\"sei\">新井</span><span class=\"mei\">杉生</span><span class=\"age\">(58)</span></div></td>"  
[69] "<td class=\"namae\"><div><span class=\"sei\">菅</span><span class=\"mei\">直人</span><span class=\"age\">(71)</span></div></td>"    
[70] "<td class=\"namae\"><div><span class=\"sei\">土屋</span><span class=\"mei\">正忠</span><span class=\"age\">(75)</span></div></td>"  
[71] "<td class=\"namae\"><div><span class=\"sei\">鴇田</span><span class=\"mei\">敦</span><span class=\"age\">(51)</span></div></td>"    
[72] "<td class=\"namae\"><div><span class=\"sei\">松本</span><span class=\"mei\">洋平</span><span class=\"age\">(44)</span></div></td>"  
[73] "<td class=\"namae\"><div><span class=\"sei\">末松</span><span class=\"mei\">義規</span><span class=\"age\">(60)</span></div></td>"  
[74] "<td class=\"namae\"><div><span class=\"sei\">佐々木</span><span class=\"mei\">里加</span><span class=\"age\">(50)</span></div></td>"
[75] "<td class=\"namae\"><div><span class=\"sei\">杉下</span><span class=\"mei\">茂雄</span><span class=\"age\">(68)</span></div></td>"  
[76] "<td class=\"namae\"><div><span class=\"sei\">木原</span><span class=\"mei\">誠二</span><span class=\"age\">(47)</span></div></td>"  
[77] "<td class=\"namae\"><div><span class=\"sei\">宮本</span><span class=\"mei\">徹</span><span class=\"age\">(45)</span></div></td>"    
[78] "<td class=\"namae\"><div><span class=\"sei\">鹿野</span><span class=\"mei\">晃</span><span class=\"age\">(44)</span></div></td>"    
[79] "<td class=\"namae\"><div><span class=\"sei\">長島</span><span class=\"mei\">昭久</span><span class=\"age\">(55)</span></div></td>"  
[80] "<td class=\"namae\"><div><span class=\"sei\">小田原</span><span class=\"mei\">潔</span><span class=\"age\">(53)</span></div></td>"  
[81] "<td class=\"namae\"><div><span class=\"sei\">小糸</span><span class=\"mei\">健介</span><span class=\"age\">(35)</span></div></td>"  
[82] "<td class=\"namae\"><div><span class=\"sei\">天木</span><span class=\"mei\">直人</span><span class=\"age\">(70)</span></div></td>"  
[83] "<td class=\"namae\"><div><span class=\"sei\">伊藤</span><span class=\"mei\">達也</span><span class=\"age\">(56)</span></div></td>"  
[84] "<td class=\"namae\"><div><span class=\"sei\">山花</span><span class=\"mei\">郁夫</span><span class=\"age\">(50)</span></div></td>"  
[85] "<td class=\"namae\"><div><span class=\"sei\">金ケ崎</span><span class=\"mei\">絵美</span><span class=\"age\">(41)</span></div></td>"
[86] "<td class=\"namae\"><div><span class=\"sei\">阿部</span><span class=\"mei\">真</span><span class=\"age\">(43)</span></div></td>"    
[87] "<td class=\"namae\"><div><span class=\"sei\">小倉</span><span class=\"mei\">将信</span><span class=\"age\">(36)</span></div></td>"  
[88] "<td class=\"namae\"><div><span class=\"sei\">伊藤</span><span class=\"mei\">俊輔</span><span class=\"age\">(38)</span></div></td>"  
[89] "<td class=\"namae\"><div><span class=\"sei\">松村</span><span class=\"mei\">亮佑</span><span class=\"age\">(37)</span></div></td>"  
[90] "<td class=\"namae\"><div><span class=\"sei\">萩生田</span><span class=\"mei\">光一</span><span class=\"age\">(54)</span></div></td>"
[91] "<td class=\"namae\"><div><span class=\"sei\">高橋</span><span class=\"mei\">斉久</span><span class=\"age\">(44)</span></div></td>"  
[92] "<td class=\"namae\"><div><span class=\"sei\">吉羽</span><span class=\"mei\">美華</span><span class=\"age\">(37)</span></div></td>"  
[93] "<td class=\"namae\"><div><span class=\"sei\">飯田</span><span class=\"mei\">美弥子</span><span class=\"age\">(57)</span></div></td>"
[94] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">信治</span><span class=\"age\">(48)</span></div></td>"  
[95] "<td class=\"namae\"><div><span class=\"sei\">山下</span><span class=\"mei\">容子</span><span class=\"age\">(58)</span></div></td>"  
[96] "<td class=\"namae\"><div><span class=\"sei\">小沢</span><span class=\"mei\">鋭仁</span><span class=\"age\">(63)</span></div></td>"  
[97] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">宣</span><span class=\"age\">(43)</span></div></td>"    

gsub関数を使って、数字以外の情報を消し、tokyo_ageと名前をつける。

tokyo_age <- gsub("<td class=\"namae\"><div><span class=\"sei\">(.*)</span><span class=\"mei\">(.*)</span><span class=\"age\">(.*)</span></div></td>","\\1, \\2, \\3, \\4", tokyo[age])
tokyo_age
 [1] "海江田, 万里, (68), " "山田, 美樹, (43), "   "松沢, 香, (39), "    
 [4] "原口, 実季, (28), "   "犬丸, 光加, (57), "   "又吉, 光雄, (73), "  
 [7] "辻, 清人, (38), "     "松尾, 明弘, (42), "   "鳩山, 太郎, (43), "  
[10] "石原, 宏高, (53), "   "松原, 仁, (61), "     "香西, 克介, (41), "  
[13] "平, 将明, (50), "     "井戸, 正枝, (51), "   "難波, 美智代, (43), "
[16] "青山, 昂平, (26), "   "若宮, 健嗣, (56), "   "手塚, 仁雄, (51), "  
[19] "福田, 峰之, (53), "   "落合, 貴之, (38), "   "越智, 隆雄, (53), "  
[22] "植松, 恵美子, (49), " "中岡, 茉妃, (26), "   "長妻, 昭, (57), "    
[25] "松本, 文明, (68), "   "荒木, 章博, (64), "   "井上, 郁磨, (26), "  
[28] "石原, 伸晃, (60), "   "吉田, 晴美, (45), "   "木内, 孝胤, (51), "  
[31] "長内, 史子, (29), "   "円, より子, (70), "   "斎藤, 郁真, (29), "  
[34] "菅原, 一秀, (55), "   "高松, 智之, (43), "   "原, 純子, (53), "    
[37] "前田, 吉成, (62), "   "鈴木, 隼人, (40), "   "鈴木, 庸介, (41), "  
[40] "若狭, 勝, (60), "     "岸, 良信, (62), "     "小山, 徹, (42), "    
[43] "吉井, 利光, (35), "   "下村, 博文, (63), "   "前田, 順一郎, (42), "
[46] "宍戸, 千絵, (39), "   "小堤, 東, (28), "     "太田, 昭宏, (72), "  
[49] "池内, 沙織, (35), "   "中村, 勝, (66), "     "鴨下, 一郎, (68), "  
[52] "北條, 智彦, (34), "   "祖父江, 元希, (42), " "松島, みどり, (61), "
[55] "矢作, 麻子, (39), "   "阿藤, 和之, (46), "   "清井, 美穂, (54), "  
[58] "大塚, 紀久雄, (76), " "秋元, 司, (46), "     "柿沢, 未途, (46), "  
[61] "吉田, 年男, (69), "   "猪野, 隆, (52), "     "大西, 英男, (71), "  
[64] "初鹿, 明博, (48), "   "田村, 謙治, (49), "   "平沢, 勝栄, (72), "  
[67] "西田, 主税, (55), "   "新井, 杉生, (58), "   "菅, 直人, (71), "    
[70] "土屋, 正忠, (75), "   "鴇田, 敦, (51), "     "松本, 洋平, (44), "  
[73] "末松, 義規, (60), "   "佐々木, 里加, (50), " "杉下, 茂雄, (68), "  
[76] "木原, 誠二, (47), "   "宮本, 徹, (45), "     "鹿野, 晃, (44), "    
[79] "長島, 昭久, (55), "   "小田原, 潔, (53), "   "小糸, 健介, (35), "  
[82] "天木, 直人, (70), "   "伊藤, 達也, (56), "   "山花, 郁夫, (50), "  
[85] "金ケ崎, 絵美, (41), " "阿部, 真, (43), "     "小倉, 将信, (36), "  
[88] "伊藤, 俊輔, (38), "   "松村, 亮佑, (37), "   "萩生田, 光一, (54), "
[91] "高橋, 斉久, (44), "   "吉羽, 美華, (37), "   "飯田, 美弥子, (57), "
[94] "井上, 信治, (48), "   "山下, 容子, (58), "   "小沢, 鋭仁, (63), "  
[97] "井上, 宣, (43), "    

・age 以外の情報を消す。
gsub関数を使って、カッコを置換して消す。

tokyo_age <- gsub("[()]","",tokyo_age)
tokyo_age
 [1] "海江田, 万里, 68, " "山田, 美樹, 43, "   "松沢, 香, 39, "    
 [4] "原口, 実季, 28, "   "犬丸, 光加, 57, "   "又吉, 光雄, 73, "  
 [7] "辻, 清人, 38, "     "松尾, 明弘, 42, "   "鳩山, 太郎, 43, "  
[10] "石原, 宏高, 53, "   "松原, 仁, 61, "     "香西, 克介, 41, "  
[13] "平, 将明, 50, "     "井戸, 正枝, 51, "   "難波, 美智代, 43, "
[16] "青山, 昂平, 26, "   "若宮, 健嗣, 56, "   "手塚, 仁雄, 51, "  
[19] "福田, 峰之, 53, "   "落合, 貴之, 38, "   "越智, 隆雄, 53, "  
[22] "植松, 恵美子, 49, " "中岡, 茉妃, 26, "   "長妻, 昭, 57, "    
[25] "松本, 文明, 68, "   "荒木, 章博, 64, "   "井上, 郁磨, 26, "  
[28] "石原, 伸晃, 60, "   "吉田, 晴美, 45, "   "木内, 孝胤, 51, "  
[31] "長内, 史子, 29, "   "円, より子, 70, "   "斎藤, 郁真, 29, "  
[34] "菅原, 一秀, 55, "   "高松, 智之, 43, "   "原, 純子, 53, "    
[37] "前田, 吉成, 62, "   "鈴木, 隼人, 40, "   "鈴木, 庸介, 41, "  
[40] "若狭, 勝, 60, "     "岸, 良信, 62, "     "小山, 徹, 42, "    
[43] "吉井, 利光, 35, "   "下村, 博文, 63, "   "前田, 順一郎, 42, "
[46] "宍戸, 千絵, 39, "   "小堤, 東, 28, "     "太田, 昭宏, 72, "  
[49] "池内, 沙織, 35, "   "中村, 勝, 66, "     "鴨下, 一郎, 68, "  
[52] "北條, 智彦, 34, "   "祖父江, 元希, 42, " "松島, みどり, 61, "
[55] "矢作, 麻子, 39, "   "阿藤, 和之, 46, "   "清井, 美穂, 54, "  
[58] "大塚, 紀久雄, 76, " "秋元, 司, 46, "     "柿沢, 未途, 46, "  
[61] "吉田, 年男, 69, "   "猪野, 隆, 52, "     "大西, 英男, 71, "  
[64] "初鹿, 明博, 48, "   "田村, 謙治, 49, "   "平沢, 勝栄, 72, "  
[67] "西田, 主税, 55, "   "新井, 杉生, 58, "   "菅, 直人, 71, "    
[70] "土屋, 正忠, 75, "   "鴇田, 敦, 51, "     "松本, 洋平, 44, "  
[73] "末松, 義規, 60, "   "佐々木, 里加, 50, " "杉下, 茂雄, 68, "  
[76] "木原, 誠二, 47, "   "宮本, 徹, 45, "     "鹿野, 晃, 44, "    
[79] "長島, 昭久, 55, "   "小田原, 潔, 53, "   "小糸, 健介, 35, "  
[82] "天木, 直人, 70, "   "伊藤, 達也, 56, "   "山花, 郁夫, 50, "  
[85] "金ケ崎, 絵美, 41, " "阿部, 真, 43, "     "小倉, 将信, 36, "  
[88] "伊藤, 俊輔, 38, "   "松村, 亮佑, 37, "   "萩生田, 光一, 54, "
[91] "高橋, 斉久, 44, "   "吉羽, 美華, 37, "   "飯田, 美弥子, 57, "
[94] "井上, 信治, 48, "   "山下, 容子, 58, "   "小沢, 鋭仁, 63, "  
[97] "井上, 宣, 43, "    

gsub関数を使って、カンマを置換して消す。

tokyo_age <- gsub(",","",tokyo_age)
tokyo_age
 [1] "海江田 万里 68 " "山田 美樹 43 "   "松沢 香 39 "    
 [4] "原口 実季 28 "   "犬丸 光加 57 "   "又吉 光雄 73 "  
 [7] "辻 清人 38 "     "松尾 明弘 42 "   "鳩山 太郎 43 "  
[10] "石原 宏高 53 "   "松原 仁 61 "     "香西 克介 41 "  
[13] "平 将明 50 "     "井戸 正枝 51 "   "難波 美智代 43 "
[16] "青山 昂平 26 "   "若宮 健嗣 56 "   "手塚 仁雄 51 "  
[19] "福田 峰之 53 "   "落合 貴之 38 "   "越智 隆雄 53 "  
[22] "植松 恵美子 49 " "中岡 茉妃 26 "   "長妻 昭 57 "    
[25] "松本 文明 68 "   "荒木 章博 64 "   "井上 郁磨 26 "  
[28] "石原 伸晃 60 "   "吉田 晴美 45 "   "木内 孝胤 51 "  
[31] "長内 史子 29 "   "円 より子 70 "   "斎藤 郁真 29 "  
[34] "菅原 一秀 55 "   "高松 智之 43 "   "原 純子 53 "    
[37] "前田 吉成 62 "   "鈴木 隼人 40 "   "鈴木 庸介 41 "  
[40] "若狭 勝 60 "     "岸 良信 62 "     "小山 徹 42 "    
[43] "吉井 利光 35 "   "下村 博文 63 "   "前田 順一郎 42 "
[46] "宍戸 千絵 39 "   "小堤 東 28 "     "太田 昭宏 72 "  
[49] "池内 沙織 35 "   "中村 勝 66 "     "鴨下 一郎 68 "  
[52] "北條 智彦 34 "   "祖父江 元希 42 " "松島 みどり 61 "
[55] "矢作 麻子 39 "   "阿藤 和之 46 "   "清井 美穂 54 "  
[58] "大塚 紀久雄 76 " "秋元 司 46 "     "柿沢 未途 46 "  
[61] "吉田 年男 69 "   "猪野 隆 52 "     "大西 英男 71 "  
[64] "初鹿 明博 48 "   "田村 謙治 49 "   "平沢 勝栄 72 "  
[67] "西田 主税 55 "   "新井 杉生 58 "   "菅 直人 71 "    
[70] "土屋 正忠 75 "   "鴇田 敦 51 "     "松本 洋平 44 "  
[73] "末松 義規 60 "   "佐々木 里加 50 " "杉下 茂雄 68 "  
[76] "木原 誠二 47 "   "宮本 徹 45 "     "鹿野 晃 44 "    
[79] "長島 昭久 55 "   "小田原 潔 53 "   "小糸 健介 35 "  
[82] "天木 直人 70 "   "伊藤 達也 56 "   "山花 郁夫 50 "  
[85] "金ケ崎 絵美 41 " "阿部 真 43 "     "小倉 将信 36 "  
[88] "伊藤 俊輔 38 "   "松村 亮佑 37 "   "萩生田 光一 54 "
[91] "高橋 斉久 44 "   "吉羽 美華 37 "   "飯田 美弥子 57 "
[94] "井上 信治 48 "   "山下 容子 58 "   "小沢 鋭仁 63 "  
[97] "井上 宣 43 "    

・名前を消して、数字だけを残したい。
・名前にひらがなが入っている候補者がいる。
・ひらがなの文字を消す。

tokyo_age <- gsub("[あ-ん]","",tokyo_age)
tokyo_age
 [1] "海江田 万里 68 " "山田 美樹 43 "   "松沢 香 39 "    
 [4] "原口 実季 28 "   "犬丸 光加 57 "   "又吉 光雄 73 "  
 [7] "辻 清人 38 "     "松尾 明弘 42 "   "鳩山 太郎 43 "  
[10] "石原 宏高 53 "   "松原 仁 61 "     "香西 克介 41 "  
[13] "平 将明 50 "     "井戸 正枝 51 "   "難波 美智代 43 "
[16] "青山 昂平 26 "   "若宮 健嗣 56 "   "手塚 仁雄 51 "  
[19] "福田 峰之 53 "   "落合 貴之 38 "   "越智 隆雄 53 "  
[22] "植松 恵美子 49 " "中岡 茉妃 26 "   "長妻 昭 57 "    
[25] "松本 文明 68 "   "荒木 章博 64 "   "井上 郁磨 26 "  
[28] "石原 伸晃 60 "   "吉田 晴美 45 "   "木内 孝胤 51 "  
[31] "長内 史子 29 "   "円 子 70 "       "斎藤 郁真 29 "  
[34] "菅原 一秀 55 "   "高松 智之 43 "   "原 純子 53 "    
[37] "前田 吉成 62 "   "鈴木 隼人 40 "   "鈴木 庸介 41 "  
[40] "若狭 勝 60 "     "岸 良信 62 "     "小山 徹 42 "    
[43] "吉井 利光 35 "   "下村 博文 63 "   "前田 順一郎 42 "
[46] "宍戸 千絵 39 "   "小堤 東 28 "     "太田 昭宏 72 "  
[49] "池内 沙織 35 "   "中村 勝 66 "     "鴨下 一郎 68 "  
[52] "北條 智彦 34 "   "祖父江 元希 42 " "松島  61 "      
[55] "矢作 麻子 39 "   "阿藤 和之 46 "   "清井 美穂 54 "  
[58] "大塚 紀久雄 76 " "秋元 司 46 "     "柿沢 未途 46 "  
[61] "吉田 年男 69 "   "猪野 隆 52 "     "大西 英男 71 "  
[64] "初鹿 明博 48 "   "田村 謙治 49 "   "平沢 勝栄 72 "  
[67] "西田 主税 55 "   "新井 杉生 58 "   "菅 直人 71 "    
[70] "土屋 正忠 75 "   "鴇田 敦 51 "     "松本 洋平 44 "  
[73] "末松 義規 60 "   "佐々木 里加 50 " "杉下 茂雄 68 "  
[76] "木原 誠二 47 "   "宮本 徹 45 "     "鹿野 晃 44 "    
[79] "長島 昭久 55 "   "小田原 潔 53 "   "小糸 健介 35 "  
[82] "天木 直人 70 "   "伊藤 達也 56 "   "山花 郁夫 50 "  
[85] "金ケ崎 絵美 41 " "阿部 真 43 "     "小倉 将信 36 "  
[88] "伊藤 俊輔 38 "   "松村 亮佑 37 "   "萩生田 光一 54 "
[91] "高橋 斉久 44 "   "吉羽 美華 37 "   "飯田 美弥子 57 "
[94] "井上 信治 48 "   "山下 容子 58 "   "小沢 鋭仁 63 "  
[97] "井上 宣 43 "    

gsub関数を使って、姓と名の間の半角スペースを消す。

tokyo_age <- gsub(" ","",tokyo_age)
tokyo_age
 [1] "海江田万里68" "山田美樹43"   "松沢香39"     "原口実季28"  
 [5] "犬丸光加57"   "又吉光雄73"   "辻清人38"     "松尾明弘42"  
 [9] "鳩山太郎43"   "石原宏高53"   "松原仁61"     "香西克介41"  
[13] "平将明50"     "井戸正枝51"   "難波美智代43" "青山昂平26"  
[17] "若宮健嗣56"   "手塚仁雄51"   "福田峰之53"   "落合貴之38"  
[21] "越智隆雄53"   "植松恵美子49" "中岡茉妃26"   "長妻昭57"    
[25] "松本文明68"   "荒木章博64"   "井上郁磨26"   "石原伸晃60"  
[29] "吉田晴美45"   "木内孝胤51"   "長内史子29"   "円子70"      
[33] "斎藤郁真29"   "菅原一秀55"   "高松智之43"   "原純子53"    
[37] "前田吉成62"   "鈴木隼人40"   "鈴木庸介41"   "若狭勝60"    
[41] "岸良信62"     "小山徹42"     "吉井利光35"   "下村博文63"  
[45] "前田順一郎42" "宍戸千絵39"   "小堤東28"     "太田昭宏72"  
[49] "池内沙織35"   "中村勝66"     "鴨下一郎68"   "北條智彦34"  
[53] "祖父江元希42" "松島61"       "矢作麻子39"   "阿藤和之46"  
[57] "清井美穂54"   "大塚紀久雄76" "秋元司46"     "柿沢未途46"  
[61] "吉田年男69"   "猪野隆52"     "大西英男71"   "初鹿明博48"  
[65] "田村謙治49"   "平沢勝栄72"   "西田主税55"   "新井杉生58"  
[69] "菅直人71"     "土屋正忠75"   "鴇田敦51"     "松本洋平44"  
[73] "末松義規60"   "佐々木里加50" "杉下茂雄68"   "木原誠二47"  
[77] "宮本徹45"     "鹿野晃44"     "長島昭久55"   "小田原潔53"  
[81] "小糸健介35"   "天木直人70"   "伊藤達也56"   "山花郁夫50"  
[85] "金ケ崎絵美41" "阿部真43"     "小倉将信36"   "伊藤俊輔38"  
[89] "松村亮佑37"   "萩生田光一54" "高橋斉久44"   "吉羽美華37"  
[93] "飯田美弥子57" "井上信治48"   "山下容子58"   "小沢鋭仁63"  
[97] "井上宣43"    

・氏名に漢数字が入っている候補者がいる。
gsub関数を使って、漢数字を消す。

tokyo_age <- gsub("[一-十]", "", tokyo_age)
tokyo_age
 [1] "海江田里68"   "山田美樹43"   "松沢香39"     "原口実季28"  
 [5] "犬57"         "又吉雄73"     "辻清38"       "松尾明弘42"  
 [9] "鳩山太郎43"   "石原宏高53"   "松原61"       "香西41"      
[13] "平将明50"     "戸正枝51"     "難波美智43"   "青山昂平26"  
[17] "若宮嗣56"     "手塚雄51"     "福田峰53"     "落合貴38"    
[21] "越智隆雄53"   "植松恵美子49" "岡茉妃26"     "長妻昭57"    
[25] "松本文明68"   "荒木章博64"   "郁磨26"       "石原晃60"    
[29] "吉田晴美45"   "木孝胤51"     "長史子29"     "子70"        
[33] "斎藤郁真29"   "菅原秀55"     "高松智43"     "原純子53"    
[37] "田吉成62"     "鈴木隼40"     "鈴木庸41"     "若狭60"      
[41] "岸良62"       "小山徹42"     "吉35"         "村博文63"    
[45] "田順郎42"     "宍戸千絵39"   "小堤東28"     "太田昭宏72"  
[49] "池沙織35"     "村66"         "鴨郎68"       "條智彦34"    
[53] "祖父江希42"   "松島61"       "矢麻子39"     "阿藤和46"    
[57] "清美穂54"     "大塚紀雄76"   "秋司46"       "柿沢未途46"  
[61] "吉田年男69"   "猪野隆52"     "大西英男71"   "鹿明博48"    
[65] "田村謙治49"   "平沢栄72"     "西田税55"     "新杉生58"    
[69] "菅直71"       "土屋正忠75"   "鴇田敦51"     "松本洋平44"  
[73] "末松義規60"   "々木里50"     "杉茂雄68"     "木原誠47"    
[77] "宮本徹45"     "鹿野晃44"     "長島昭55"     "小田原潔53"  
[81] "小糸35"       "天木直70"     "藤達56"       "山花郁夫50"  
[85] "金ケ崎絵美41" "阿部真43"     "小将36"       "藤輔38"      
[89] "松村37"       "萩生田54"     "高橋斉44"     "吉羽美華37"  
[93] "飯田美弥子57" "治48"         "山容子58"     "小沢鋭63"    
[97] "宣43"        

gsub関数を使って、氏名の漢字を消す。

tokyo_age <- gsub("[亜-黑]", "", tokyo_age)
tokyo_age
 [1] "68"   "43"   "39"   "28"   "57"   "73"   "38"   "42"   "43"   "53"  
[11] "61"   "41"   "50"   "51"   "43"   "26"   "56"   "51"   "53"   "38"  
[21] "53"   "49"   "26"   "57"   "68"   "64"   "26"   "60"   "45"   "51"  
[31] "29"   "70"   "29"   "55"   "43"   "53"   "62"   "40"   "41"   "60"  
[41] "62"   "42"   "35"   "63"   "42"   "39"   "28"   "72"   "35"   "66"  
[51] "68"   "34"   "42"   "61"   "39"   "46"   "54"   "76"   "46"   "46"  
[61] "69"   "52"   "71"   "48"   "49"   "72"   "55"   "58"   "71"   "75"  
[71] "51"   "44"   "60"   "々50" "68"   "47"   "45"   "44"   "55"   "53"  
[81] "35"   "70"   "56"   "50"   "ケ41" "43"   "36"   "38"   "37"   "54"  
[91] "44"   "37"   "57"   "48"   "58"   "63"   "43"  

・文字が残ってしまった候補者が 2 人だけいる。
・その候補者の文字「ケ」と「々」を消す。
gsub関数を使って、「ケ」を消す。

tokyo_age <- gsub("ケ", "", tokyo_age)
tokyo_age
 [1] "68"   "43"   "39"   "28"   "57"   "73"   "38"   "42"   "43"   "53"  
[11] "61"   "41"   "50"   "51"   "43"   "26"   "56"   "51"   "53"   "38"  
[21] "53"   "49"   "26"   "57"   "68"   "64"   "26"   "60"   "45"   "51"  
[31] "29"   "70"   "29"   "55"   "43"   "53"   "62"   "40"   "41"   "60"  
[41] "62"   "42"   "35"   "63"   "42"   "39"   "28"   "72"   "35"   "66"  
[51] "68"   "34"   "42"   "61"   "39"   "46"   "54"   "76"   "46"   "46"  
[61] "69"   "52"   "71"   "48"   "49"   "72"   "55"   "58"   "71"   "75"  
[71] "51"   "44"   "60"   "々50" "68"   "47"   "45"   "44"   "55"   "53"  
[81] "35"   "70"   "56"   "50"   "41"   "43"   "36"   "38"   "37"   "54"  
[91] "44"   "37"   "57"   "48"   "58"   "63"   "43"  

gsub関数を使って、「々」を消す。

tokyo_age <- gsub("々", "", tokyo_age)
tokyo_age
 [1] "68" "43" "39" "28" "57" "73" "38" "42" "43" "53" "61" "41" "50" "51"
[15] "43" "26" "56" "51" "53" "38" "53" "49" "26" "57" "68" "64" "26" "60"
[29] "45" "51" "29" "70" "29" "55" "43" "53" "62" "40" "41" "60" "62" "42"
[43] "35" "63" "42" "39" "28" "72" "35" "66" "68" "34" "42" "61" "39" "46"
[57] "54" "76" "46" "46" "69" "52" "71" "48" "49" "72" "55" "58" "71" "75"
[71] "51" "44" "60" "50" "68" "47" "45" "44" "55" "53" "35" "70" "56" "50"
[85] "41" "43" "36" "38" "37" "54" "44" "37" "57" "48" "58" "63" "43"

・これで、年齢の情報のみを抜き出すことができた。

2-4. 必要な情報(票数)だけを選別する

・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者の得票数は num の後ろにあることがわかる。

・最初にgrep()関数を使って、tokyo から得票数のデータだけを選別し count と名前をつける。

count <- grep("<td class=\"num\"", tokyo)
tokyo[count]
 [1] "<td class=\"num\"><div>96,255<span>40.69<span class=\"tani\">%</span></span></div></td>" 
 [2] "<td class=\"num\"><div>93,234<span>39.41<span class=\"tani\">%</span></span></div></td>" 
 [3] "<td class=\"num\"><div>40,376<span>17.07<span class=\"tani\">%</span></span></div></td>" 
 [4] "<td class=\"num\"><div>3,806<span>1.61<span class=\"tani\">%</span></span></div></td>"   
 [5] "<td class=\"num\"><div>1,570<span>0.66<span class=\"tani\">%</span></span></div></td>"   
 [6] "<td class=\"num\"><div>1,307<span>0.55<span class=\"tani\">%</span></span></div></td>"   
 [7] "<td class=\"num\"><div>112,993<span>45.90<span class=\"tani\">%</span></span></div></td>"
 [8] "<td class=\"num\"><div>91,230<span>37.06<span class=\"tani\">%</span></span></div></td>" 
 [9] "<td class=\"num\"><div>41,955<span>17.04<span class=\"tani\">%</span></span></div></td>" 
[10] "<td class=\"num\"><div>107,708<span>43.58<span class=\"tani\">%</span></span></div></td>"
[11] "<td class=\"num\"><div>94,380<span>38.18<span class=\"tani\">%</span></span></div></td>" 
[12] "<td class=\"num\"><div>45,088<span>18.24<span class=\"tani\">%</span></span></div></td>" 
[13] "<td class=\"num\"><div>115,239<span>50.08<span class=\"tani\">%</span></span></div></td>"
[14] "<td class=\"num\"><div>53,480<span>23.24<span class=\"tani\">%</span></span></div></td>" 
[15] "<td class=\"num\"><div>35,352<span>15.36<span class=\"tani\">%</span></span></div></td>" 
[16] "<td class=\"num\"><div>26,037<span>11.32<span class=\"tani\">%</span></span></div></td>" 
[17] "<td class=\"num\"><div>101,314<span>41.15<span class=\"tani\">%</span></span></div></td>"
[18] "<td class=\"num\"><div>99,182<span>40.28<span class=\"tani\">%</span></span></div></td>" 
[19] "<td class=\"num\"><div>45,737<span>18.57<span class=\"tani\">%</span></span></div></td>" 
[20] "<td class=\"num\"><div>100,400<span>40.81<span class=\"tani\">%</span></span></div></td>"
[21] "<td class=\"num\"><div>98,422<span>40.01<span class=\"tani\">%</span></span></div></td>" 
[22] "<td class=\"num\"><div>42,862<span>17.42<span class=\"tani\">%</span></span></div></td>" 
[23] "<td class=\"num\"><div>4,307<span>1.75<span class=\"tani\">%</span></span></div></td>"   
[24] "<td class=\"num\"><div>117,118<span>50.52<span class=\"tani\">%</span></span></div></td>"
[25] "<td class=\"num\"><div>85,305<span>36.80<span class=\"tani\">%</span></span></div></td>" 
[26] "<td class=\"num\"><div>25,531<span>11.01<span class=\"tani\">%</span></span></div></td>" 
[27] "<td class=\"num\"><div>3,850<span>1.66<span class=\"tani\">%</span></span></div></td>"   
[28] "<td class=\"num\"><div>99,863<span>39.22<span class=\"tani\">%</span></span></div></td>" 
[29] "<td class=\"num\"><div>76,283<span>29.96<span class=\"tani\">%</span></span></div></td>" 
[30] "<td class=\"num\"><div>41,175<span>16.17<span class=\"tani\">%</span></span></div></td>" 
[31] "<td class=\"num\"><div>22,399<span>8.80<span class=\"tani\">%</span></span></div></td>"  
[32] "<td class=\"num\"><div>11,997<span>4.71<span class=\"tani\">%</span></span></div></td>"  
[33] "<td class=\"num\"><div>2,931<span>1.15<span class=\"tani\">%</span></span></div></td>"   
[34] "<td class=\"num\"><div>122,279<span>49.17<span class=\"tani\">%</span></span></div></td>"
[35] "<td class=\"num\"><div>64,731<span>26.03<span class=\"tani\">%</span></span></div></td>" 
[36] "<td class=\"num\"><div>57,439<span>23.10<span class=\"tani\">%</span></span></div></td>" 
[37] "<td class=\"num\"><div>4,243<span>1.71<span class=\"tani\">%</span></span></div></td>"   
[38] "<td class=\"num\"><div>91,146<span>37.37<span class=\"tani\">%</span></span></div></td>" 
[39] "<td class=\"num\"><div>70,168<span>28.77<span class=\"tani\">%</span></span></div></td>" 
[40] "<td class=\"num\"><div>57,901<span>23.74<span class=\"tani\">%</span></span></div></td>" 
[41] "<td class=\"num\"><div>20,828<span>8.54<span class=\"tani\">%</span></span></div></td>"  
[42] "<td class=\"num\"><div>2,107<span>0.86<span class=\"tani\">%</span></span></div></td>"   
[43] "<td class=\"num\"><div>1,744<span>0.72<span class=\"tani\">%</span></span></div></td>"   
[44] "<td class=\"num\"><div>104,612<span>44.90<span class=\"tani\">%</span></span></div></td>"
[45] "<td class=\"num\"><div>60,291<span>25.88<span class=\"tani\">%</span></span></div></td>" 
[46] "<td class=\"num\"><div>42,668<span>18.31<span class=\"tani\">%</span></span></div></td>" 
[47] "<td class=\"num\"><div>25,426<span>10.91<span class=\"tani\">%</span></span></div></td>" 
[48] "<td class=\"num\"><div>112,597<span>51.64<span class=\"tani\">%</span></span></div></td>"
[49] "<td class=\"num\"><div>83,544<span>38.32<span class=\"tani\">%</span></span></div></td>" 
[50] "<td class=\"num\"><div>21,892<span>10.04<span class=\"tani\">%</span></span></div></td>" 
[51] "<td class=\"num\"><div>120,744<span>55.23<span class=\"tani\">%</span></span></div></td>"
[52] "<td class=\"num\"><div>67,070<span>30.68<span class=\"tani\">%</span></span></div></td>" 
[53] "<td class=\"num\"><div>30,807<span>14.09<span class=\"tani\">%</span></span></div></td>" 
[54] "<td class=\"num\"><div>104,137<span>46.94<span class=\"tani\">%</span></span></div></td>"
[55] "<td class=\"num\"><div>63,235<span>28.50<span class=\"tani\">%</span></span></div></td>" 
[56] "<td class=\"num\"><div>46,600<span>21.00<span class=\"tani\">%</span></span></div></td>" 
[57] "<td class=\"num\"><div>4,282<span>1.93<span class=\"tani\">%</span></span></div></td>"   
[58] "<td class=\"num\"><div>3,607<span>1.63<span class=\"tani\">%</span></span></div></td>"   
[59] "<td class=\"num\"><div>101,155<span>45.55<span class=\"tani\">%</span></span></div></td>"
[60] "<td class=\"num\"><div>70,325<span>31.67<span class=\"tani\">%</span></span></div></td>" 
[61] "<td class=\"num\"><div>34,943<span>15.73<span class=\"tani\">%</span></span></div></td>" 
[62] "<td class=\"num\"><div>15,667<span>7.05<span class=\"tani\">%</span></span></div></td>"  
[63] "<td class=\"num\"><div>84,457<span>40.91<span class=\"tani\">%</span></span></div></td>" 
[64] "<td class=\"num\"><div>71,405<span>34.59<span class=\"tani\">%</span></span></div></td>" 
[65] "<td class=\"num\"><div>50,568<span>24.50<span class=\"tani\">%</span></span></div></td>" 
[66] "<td class=\"num\"><div>127,632<span>57.95<span class=\"tani\">%</span></span></div></td>"
[67] "<td class=\"num\"><div>49,485<span>22.47<span class=\"tani\">%</span></span></div></td>" 
[68] "<td class=\"num\"><div>43,138<span>19.59<span class=\"tani\">%</span></span></div></td>" 
[69] "<td class=\"num\"><div>96,713<span>40.73<span class=\"tani\">%</span></span></div></td>" 
[70] "<td class=\"num\"><div>95,667<span>40.29<span class=\"tani\">%</span></span></div></td>" 
[71] "<td class=\"num\"><div>45,081<span>18.98<span class=\"tani\">%</span></span></div></td>" 
[72] "<td class=\"num\"><div>96,229<span>41.14<span class=\"tani\">%</span></span></div></td>" 
[73] "<td class=\"num\"><div>90,540<span>38.71<span class=\"tani\">%</span></span></div></td>" 
[74] "<td class=\"num\"><div>29,743<span>12.72<span class=\"tani\">%</span></span></div></td>" 
[75] "<td class=\"num\"><div>17,377<span>7.43<span class=\"tani\">%</span></span></div></td>"  
[76] "<td class=\"num\"><div>107,686<span>49.89<span class=\"tani\">%</span></span></div></td>"
[77] "<td class=\"num\"><div>57,741<span>26.75<span class=\"tani\">%</span></span></div></td>" 
[78] "<td class=\"num\"><div>50,439<span>23.37<span class=\"tani\">%</span></span></div></td>" 
[79] "<td class=\"num\"><div>92,356<span>40.97<span class=\"tani\">%</span></span></div></td>" 
[80] "<td class=\"num\"><div>88,225<span>39.14<span class=\"tani\">%</span></span></div></td>" 
[81] "<td class=\"num\"><div>38,195<span>16.94<span class=\"tani\">%</span></span></div></td>" 
[82] "<td class=\"num\"><div>6,655<span>2.95<span class=\"tani\">%</span></span></div></td>"   
[83] "<td class=\"num\"><div>110,493<span>43.39<span class=\"tani\">%</span></span></div></td>"
[84] "<td class=\"num\"><div>91,073<span>35.76<span class=\"tani\">%</span></span></div></td>" 
[85] "<td class=\"num\"><div>30,236<span>11.87<span class=\"tani\">%</span></span></div></td>" 
[86] "<td class=\"num\"><div>22,859<span>8.98<span class=\"tani\">%</span></span></div></td>"  
[87] "<td class=\"num\"><div>110,522<span>44.95<span class=\"tani\">%</span></span></div></td>"
[88] "<td class=\"num\"><div>76,450<span>31.09<span class=\"tani\">%</span></span></div></td>" 
[89] "<td class=\"num\"><div>58,929<span>23.96<span class=\"tani\">%</span></span></div></td>" 
[90] "<td class=\"num\"><div>122,331<span>49.32<span class=\"tani\">%</span></span></div></td>"
[91] "<td class=\"num\"><div>61,441<span>24.77<span class=\"tani\">%</span></span></div></td>" 
[92] "<td class=\"num\"><div>39,892<span>16.08<span class=\"tani\">%</span></span></div></td>" 
[93] "<td class=\"num\"><div>24,349<span>9.82<span class=\"tani\">%</span></span></div></td>"  
[94] "<td class=\"num\"><div>112,014<span>51.81<span class=\"tani\">%</span></span></div></td>"
[95] "<td class=\"num\"><div>44,884<span>20.76<span class=\"tani\">%</span></span></div></td>" 
[96] "<td class=\"num\"><div>38,286<span>17.71<span class=\"tani\">%</span></span></div></td>" 
[97] "<td class=\"num\"><div>21,031<span>9.73<span class=\"tani\">%</span></span></div></td>"  

gsub関数を使って、数字以外の情報を消し、tokyo_count と名前をつける。

tokyo_count <- gsub("<td class=\"num\"><div>(.*)<span>[0-99.99]+(.*)<span class=\"tani\">%</span></span></div></td>","\\1:\\2", tokyo[count])
tokyo_count
 [1] "96,255:"  "93,234:"  "40,376:"  "3,806:"   "1,570:"   "1,307:"  
 [7] "112,993:" "91,230:"  "41,955:"  "107,708:" "94,380:"  "45,088:" 
[13] "115,239:" "53,480:"  "35,352:"  "26,037:"  "101,314:" "99,182:" 
[19] "45,737:"  "100,400:" "98,422:"  "42,862:"  "4,307:"   "117,118:"
[25] "85,305:"  "25,531:"  "3,850:"   "99,863:"  "76,283:"  "41,175:" 
[31] "22,399:"  "11,997:"  "2,931:"   "122,279:" "64,731:"  "57,439:" 
[37] "4,243:"   "91,146:"  "70,168:"  "57,901:"  "20,828:"  "2,107:"  
[43] "1,744:"   "104,612:" "60,291:"  "42,668:"  "25,426:"  "112,597:"
[49] "83,544:"  "21,892:"  "120,744:" "67,070:"  "30,807:"  "104,137:"
[55] "63,235:"  "46,600:"  "4,282:"   "3,607:"   "101,155:" "70,325:" 
[61] "34,943:"  "15,667:"  "84,457:"  "71,405:"  "50,568:"  "127,632:"
[67] "49,485:"  "43,138:"  "96,713:"  "95,667:"  "45,081:"  "96,229:" 
[73] "90,540:"  "29,743:"  "17,377:"  "107,686:" "57,741:"  "50,439:" 
[79] "92,356:"  "88,225:"  "38,195:"  "6,655:"   "110,493:" "91,073:" 
[85] "30,236:"  "22,859:"  "110,522:" "76,450:"  "58,929:"  "122,331:"
[91] "61,441:"  "39,892:"  "24,349:"  "112,014:" "44,884:"  "38,286:" 
[97] "21,031:" 

gsub関数を使って、 " : " を置換して消す。

tokyo_count <- gsub(":", "", tokyo_count)
tokyo_count
 [1] "96,255"  "93,234"  "40,376"  "3,806"   "1,570"   "1,307"   "112,993"
 [8] "91,230"  "41,955"  "107,708" "94,380"  "45,088"  "115,239" "53,480" 
[15] "35,352"  "26,037"  "101,314" "99,182"  "45,737"  "100,400" "98,422" 
[22] "42,862"  "4,307"   "117,118" "85,305"  "25,531"  "3,850"   "99,863" 
[29] "76,283"  "41,175"  "22,399"  "11,997"  "2,931"   "122,279" "64,731" 
[36] "57,439"  "4,243"   "91,146"  "70,168"  "57,901"  "20,828"  "2,107"  
[43] "1,744"   "104,612" "60,291"  "42,668"  "25,426"  "112,597" "83,544" 
[50] "21,892"  "120,744" "67,070"  "30,807"  "104,137" "63,235"  "46,600" 
[57] "4,282"   "3,607"   "101,155" "70,325"  "34,943"  "15,667"  "84,457" 
[64] "71,405"  "50,568"  "127,632" "49,485"  "43,138"  "96,713"  "95,667" 
[71] "45,081"  "96,229"  "90,540"  "29,743"  "17,377"  "107,686" "57,741" 
[78] "50,439"  "92,356"  "88,225"  "38,195"  "6,655"   "110,493" "91,073" 
[85] "30,236"  "22,859"  "110,522" "76,450"  "58,929"  "122,331" "61,441" 
[92] "39,892"  "24,349"  "112,014" "44,884"  "38,286"  "21,031" 

gsub関数を使って、カンマを置換して消す。

tokyo_count <- gsub(",", "", tokyo_count)
tokyo_count
 [1] "96255"  "93234"  "40376"  "3806"   "1570"   "1307"   "112993"
 [8] "91230"  "41955"  "107708" "94380"  "45088"  "115239" "53480" 
[15] "35352"  "26037"  "101314" "99182"  "45737"  "100400" "98422" 
[22] "42862"  "4307"   "117118" "85305"  "25531"  "3850"   "99863" 
[29] "76283"  "41175"  "22399"  "11997"  "2931"   "122279" "64731" 
[36] "57439"  "4243"   "91146"  "70168"  "57901"  "20828"  "2107"  
[43] "1744"   "104612" "60291"  "42668"  "25426"  "112597" "83544" 
[50] "21892"  "120744" "67070"  "30807"  "104137" "63235"  "46600" 
[57] "4282"   "3607"   "101155" "70325"  "34943"  "15667"  "84457" 
[64] "71405"  "50568"  "127632" "49485"  "43138"  "96713"  "95667" 
[71] "45081"  "96229"  "90540"  "29743"  "17377"  "107686" "57741" 
[78] "50439"  "92356"  "88225"  "38195"  "6655"   "110493" "91073" 
[85] "30236"  "22859"  "110522" "76450"  "58929"  "122331" "61441" 
[92] "39892"  "24349"  "112014" "44884"  "38286"  "21031" 

・これで、利候補者の年齢情報のみを抜き出すことができた。

2-5. 必要な情報だけ(政党)を選別する

・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者の所属政党は party の後ろにあることがわかる。

grep()関数を使って、tokyo から候補者の所属政党データだけを選別し party と名前をつける。

party <- grep("<td class=\"party\"", tokyo)
tokyo[party]
 [1] "<td class=\"party\"><div>立憲</div></td>"
 [2] "<td class=\"party\"><div>自民</div></td>"
 [3] "<td class=\"party\"><div>希望</div></td>"
 [4] "<td class=\"party\"><div>諸派</div></td>"
 [5] "<td class=\"party\"><div>諸派</div></td>"
 [6] "<td class=\"party\"><div>諸派</div></td>"
 [7] "<td class=\"party\"><div>自民</div></td>"
 [8] "<td class=\"party\"><div>立憲</div></td>"
 [9] "<td class=\"party\"><div>希望</div></td>"
[10] "<td class=\"party\"><div>自民</div></td>"
[11] "<td class=\"party\"><div>希望</div></td>"
[12] "<td class=\"party\"><div>共産</div></td>"
[13] "<td class=\"party\"><div>自民</div></td>"
[14] "<td class=\"party\"><div>立憲</div></td>"
[15] "<td class=\"party\"><div>希望</div></td>"
[16] "<td class=\"party\"><div>共産</div></td>"
[17] "<td class=\"party\"><div>自民</div></td>"
[18] "<td class=\"party\"><div>立憲</div></td>"
[19] "<td class=\"party\"><div>希望</div></td>"
[20] "<td class=\"party\"><div>立憲</div></td>"
[21] "<td class=\"party\"><div>自民</div></td>"
[22] "<td class=\"party\"><div>希望</div></td>"
[23] "<td class=\"party\"><div>諸派</div></td>"
[24] "<td class=\"party\"><div>立憲</div></td>"
[25] "<td class=\"party\"><div>自民</div></td>"
[26] "<td class=\"party\"><div>希望</div></td>"
[27] "<td class=\"party\"><div>無所</div></td>"
[28] "<td class=\"party\"><div>自民</div></td>"
[29] "<td class=\"party\"><div>立憲</div></td>"
[30] "<td class=\"party\"><div>希望</div></td>"
[31] "<td class=\"party\"><div>共産</div></td>"
[32] "<td class=\"party\"><div>無所</div></td>"
[33] "<td class=\"party\"><div>諸派</div></td>"
[34] "<td class=\"party\"><div>自民</div></td>"
[35] "<td class=\"party\"><div>希望</div></td>"
[36] "<td class=\"party\"><div>共産</div></td>"
[37] "<td class=\"party\"><div>無所</div></td>"
[38] "<td class=\"party\"><div>自民</div></td>"
[39] "<td class=\"party\"><div>立憲</div></td>"
[40] "<td class=\"party\"><div>希望</div></td>"
[41] "<td class=\"party\"><div>共産</div></td>"
[42] "<td class=\"party\"><div>無所</div></td>"
[43] "<td class=\"party\"><div>諸派</div></td>"
[44] "<td class=\"party\"><div>自民</div></td>"
[45] "<td class=\"party\"><div>立憲</div></td>"
[46] "<td class=\"party\"><div>希望</div></td>"
[47] "<td class=\"party\"><div>共産</div></td>"
[48] "<td class=\"party\"><div>公明</div></td>"
[49] "<td class=\"party\"><div>共産</div></td>"
[50] "<td class=\"party\"><div>諸派</div></td>"
[51] "<td class=\"party\"><div>自民</div></td>"
[52] "<td class=\"party\"><div>立憲</div></td>"
[53] "<td class=\"party\"><div>共産</div></td>"
[54] "<td class=\"party\"><div>自民</div></td>"
[55] "<td class=\"party\"><div>希望</div></td>"
[56] "<td class=\"party\"><div>共産</div></td>"
[57] "<td class=\"party\"><div>諸派</div></td>"
[58] "<td class=\"party\"><div>無所</div></td>"
[59] "<td class=\"party\"><div>自民</div></td>"
[60] "<td class=\"party\"><div>希望</div></td>"
[61] "<td class=\"party\"><div>共産</div></td>"
[62] "<td class=\"party\"><div>無所</div></td>"
[63] "<td class=\"party\"><div>自民</div></td>"
[64] "<td class=\"party\"><div>立憲</div></td>"
[65] "<td class=\"party\"><div>希望</div></td>"
[66] "<td class=\"party\"><div>自民</div></td>"
[67] "<td class=\"party\"><div>希望</div></td>"
[68] "<td class=\"party\"><div>共産</div></td>"
[69] "<td class=\"party\"><div>立憲</div></td>"
[70] "<td class=\"party\"><div>自民</div></td>"
[71] "<td class=\"party\"><div>希望</div></td>"
[72] "<td class=\"party\"><div>自民</div></td>"
[73] "<td class=\"party\"><div>立憲</div></td>"
[74] "<td class=\"party\"><div>希望</div></td>"
[75] "<td class=\"party\"><div>共産</div></td>"
[76] "<td class=\"party\"><div>自民</div></td>"
[77] "<td class=\"party\"><div>共産</div></td>"
[78] "<td class=\"party\"><div>希望</div></td>"
[79] "<td class=\"party\"><div>希望</div></td>"
[80] "<td class=\"party\"><div>自民</div></td>"
[81] "<td class=\"party\"><div>社民</div></td>"
[82] "<td class=\"party\"><div>諸派</div></td>"
[83] "<td class=\"party\"><div>自民</div></td>"
[84] "<td class=\"party\"><div>立憲</div></td>"
[85] "<td class=\"party\"><div>希望</div></td>"
[86] "<td class=\"party\"><div>共産</div></td>"
[87] "<td class=\"party\"><div>自民</div></td>"
[88] "<td class=\"party\"><div>希望</div></td>"
[89] "<td class=\"party\"><div>共産</div></td>"
[90] "<td class=\"party\"><div>自民</div></td>"
[91] "<td class=\"party\"><div>立憲</div></td>"
[92] "<td class=\"party\"><div>希望</div></td>"
[93] "<td class=\"party\"><div>共産</div></td>"
[94] "<td class=\"party\"><div>自民</div></td>"
[95] "<td class=\"party\"><div>立憲</div></td>"
[96] "<td class=\"party\"><div>希望</div></td>"
[97] "<td class=\"party\"><div>共産</div></td>"

gsub関数を使って、所属政党以外の情報を消し、tokyo_party と名前をつける。

tokyo_party <- gsub("<td class=\"party\"><div>(.*)</div></td>","\\1", tokyo[party])
tokyo_party
 [1] "立憲" "自民" "希望" "諸派" "諸派" "諸派" "自民" "立憲" "希望" "自民"
[11] "希望" "共産" "自民" "立憲" "希望" "共産" "自民" "立憲" "希望" "立憲"
[21] "自民" "希望" "諸派" "立憲" "自民" "希望" "無所" "自民" "立憲" "希望"
[31] "共産" "無所" "諸派" "自民" "希望" "共産" "無所" "自民" "立憲" "希望"
[41] "共産" "無所" "諸派" "自民" "立憲" "希望" "共産" "公明" "共産" "諸派"
[51] "自民" "立憲" "共産" "自民" "希望" "共産" "諸派" "無所" "自民" "希望"
[61] "共産" "無所" "自民" "立憲" "希望" "自民" "希望" "共産" "立憲" "自民"
[71] "希望" "自民" "立憲" "希望" "共産" "自民" "共産" "希望" "希望" "自民"
[81] "社民" "諸派" "自民" "立憲" "希望" "共産" "自民" "希望" "共産" "自民"
[91] "立憲" "希望" "共産" "自民" "立憲" "希望" "共産"

2-6. 必要な情報だけ(当選回数)を選別する

・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者の当選回数は tosenkaisu の後ろにあることがわかる。

grep()関数を使って、tokyo から候補者のデータだけを選別し previous と名前をつける。

previous <- grep("<td class=\"tosenkaisu\"", tokyo)
tokyo[previous]
 [1] "<td class=\"tosenkaisu\"><div>7<span>回</span></div></td>" 
 [2] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
 [3] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
 [4] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
 [5] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
 [6] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
 [7] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
 [8] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
 [9] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[10] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[11] "<td class=\"tosenkaisu\"><div>7<span>回</span></div></td>" 
[12] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[13] "<td class=\"tosenkaisu\"><div>5<span>回</span></div></td>" 
[14] "<td class=\"tosenkaisu\"><div>1<span>回</span></div></td>" 
[15] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[16] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[17] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[18] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[19] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
[20] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>" 
[21] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[22] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[23] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[24] "<td class=\"tosenkaisu\"><div>7<span>回</span></div></td>" 
[25] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[26] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[27] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[28] "<td class=\"tosenkaisu\"><div>10<span>回</span></div></td>"
[29] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[30] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>" 
[31] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[32] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[33] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[34] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>" 
[35] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[36] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[37] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[38] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>" 
[39] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[40] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>" 
[41] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[42] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[43] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[44] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>" 
[45] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[46] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[47] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[48] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>" 
[49] "<td class=\"tosenkaisu\"><div>1<span>回</span></div></td>" 
[50] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[51] "<td class=\"tosenkaisu\"><div>9<span>回</span></div></td>" 
[52] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[53] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[54] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>" 
[55] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[56] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[57] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[58] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[59] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
[60] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[61] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[62] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[63] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
[64] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
[65] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
[66] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>" 
[67] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[68] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[69] "<td class=\"tosenkaisu\"><div>13<span>回</span></div></td>"
[70] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
[71] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[72] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[73] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>" 
[74] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[75] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[76] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[77] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>" 
[78] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[79] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>" 
[80] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
[81] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[82] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[83] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>" 
[84] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>" 
[85] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[86] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[87] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>" 
[88] "<td class=\"tosenkaisu\"><div>1<span>回</span></div></td>" 
[89] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[90] "<td class=\"tosenkaisu\"><div>5<span>回</span></div></td>" 
[91] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[92] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[93] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[94] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>" 
[95] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 
[96] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>" 
[97] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>" 

gsub関数を使って、当選回数以外の情報を消し、tokyo_previous と名前をつける。

tokyo_previous <- gsub("<td class=\"tosenkaisu\"><div>(.*)<span>回</span></div></td>","\\1", tokyo[previous])
tokyo_previous
 [1] "7"  "3"  "0"  "0"  "0"  "0"  "3"  "0"  "0"  "4"  "7"  "0"  "5"  "1" 
[15] "0"  "0"  "4"  "4"  "3"  "2"  "4"  "0"  "0"  "7"  "4"  "0"  "0"  "10"
[29] "0"  "2"  "0"  "0"  "0"  "6"  "0"  "0"  "0"  "2"  "0"  "2"  "0"  "0" 
[43] "0"  "8"  "0"  "0"  "0"  "8"  "1"  "0"  "9"  "0"  "0"  "6"  "0"  "0" 
[57] "0"  "0"  "3"  "4"  "0"  "0"  "3"  "3"  "3"  "8"  "0"  "0"  "13" "3" 
[71] "0"  "4"  "6"  "0"  "0"  "4"  "2"  "0"  "6"  "3"  "0"  "0"  "8"  "4" 
[85] "0"  "0"  "3"  "1"  "0"  "5"  "0"  "0"  "0"  "6"  "0"  "8"  "0" 

2-7. 必要な情報だけ(status)を選別する

・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者の「身分」は status の後ろにあることがわかる。

grep()関数を使って、tokyo から候補者の「身分」データだけを選別し status と名前をつける。

status <- grep("<td class=\"status\"", tokyo)
tokyo[status]
 [1] "<td class=\"status\"><div>元</div></td>"
 [2] "<td class=\"status\"><div>前</div></td>"
 [3] "<td class=\"status\"><div>新</div></td>"
 [4] "<td class=\"status\"><div>新</div></td>"
 [5] "<td class=\"status\"><div>新</div></td>"
 [6] "<td class=\"status\"><div>新</div></td>"
 [7] "<td class=\"status\"><div>前</div></td>"
 [8] "<td class=\"status\"><div>新</div></td>"
 [9] "<td class=\"status\"><div>新</div></td>"
[10] "<td class=\"status\"><div>前</div></td>"
[11] "<td class=\"status\"><div>前</div></td>"
[12] "<td class=\"status\"><div>新</div></td>"
[13] "<td class=\"status\"><div>前</div></td>"
[14] "<td class=\"status\"><div>元</div></td>"
[15] "<td class=\"status\"><div>新</div></td>"
[16] "<td class=\"status\"><div>新</div></td>"
[17] "<td class=\"status\"><div>前</div></td>"
[18] "<td class=\"status\"><div>元</div></td>"
[19] "<td class=\"status\"><div>前</div></td>"
[20] "<td class=\"status\"><div>前</div></td>"
[21] "<td class=\"status\"><div>前</div></td>"
[22] "<td class=\"status\"><div>新</div></td>"
[23] "<td class=\"status\"><div>新</div></td>"
[24] "<td class=\"status\"><div>前</div></td>"
[25] "<td class=\"status\"><div>前</div></td>"
[26] "<td class=\"status\"><div>新</div></td>"
[27] "<td class=\"status\"><div>新</div></td>"
[28] "<td class=\"status\"><div>前</div></td>"
[29] "<td class=\"status\"><div>新</div></td>"
[30] "<td class=\"status\"><div>前</div></td>"
[31] "<td class=\"status\"><div>新</div></td>"
[32] "<td class=\"status\"><div>新</div></td>"
[33] "<td class=\"status\"><div>新</div></td>"
[34] "<td class=\"status\"><div>前</div></td>"
[35] "<td class=\"status\"><div>新</div></td>"
[36] "<td class=\"status\"><div>新</div></td>"
[37] "<td class=\"status\"><div>新</div></td>"
[38] "<td class=\"status\"><div>前</div></td>"
[39] "<td class=\"status\"><div>新</div></td>"
[40] "<td class=\"status\"><div>前</div></td>"
[41] "<td class=\"status\"><div>新</div></td>"
[42] "<td class=\"status\"><div>新</div></td>"
[43] "<td class=\"status\"><div>新</div></td>"
[44] "<td class=\"status\"><div>前</div></td>"
[45] "<td class=\"status\"><div>新</div></td>"
[46] "<td class=\"status\"><div>新</div></td>"
[47] "<td class=\"status\"><div>新</div></td>"
[48] "<td class=\"status\"><div>前</div></td>"
[49] "<td class=\"status\"><div>前</div></td>"
[50] "<td class=\"status\"><div>新</div></td>"
[51] "<td class=\"status\"><div>前</div></td>"
[52] "<td class=\"status\"><div>新</div></td>"
[53] "<td class=\"status\"><div>新</div></td>"
[54] "<td class=\"status\"><div>前</div></td>"
[55] "<td class=\"status\"><div>新</div></td>"
[56] "<td class=\"status\"><div>新</div></td>"
[57] "<td class=\"status\"><div>新</div></td>"
[58] "<td class=\"status\"><div>新</div></td>"
[59] "<td class=\"status\"><div>前</div></td>"
[60] "<td class=\"status\"><div>前</div></td>"
[61] "<td class=\"status\"><div>新</div></td>"
[62] "<td class=\"status\"><div>新</div></td>"
[63] "<td class=\"status\"><div>前</div></td>"
[64] "<td class=\"status\"><div>前</div></td>"
[65] "<td class=\"status\"><div>元</div></td>"
[66] "<td class=\"status\"><div>前</div></td>"
[67] "<td class=\"status\"><div>新</div></td>"
[68] "<td class=\"status\"><div>新</div></td>"
[69] "<td class=\"status\"><div>前</div></td>"
[70] "<td class=\"status\"><div>前</div></td>"
[71] "<td class=\"status\"><div>新</div></td>"
[72] "<td class=\"status\"><div>前</div></td>"
[73] "<td class=\"status\"><div>元</div></td>"
[74] "<td class=\"status\"><div>新</div></td>"
[75] "<td class=\"status\"><div>新</div></td>"
[76] "<td class=\"status\"><div>前</div></td>"
[77] "<td class=\"status\"><div>前</div></td>"
[78] "<td class=\"status\"><div>新</div></td>"
[79] "<td class=\"status\"><div>前</div></td>"
[80] "<td class=\"status\"><div>前</div></td>"
[81] "<td class=\"status\"><div>新</div></td>"
[82] "<td class=\"status\"><div>新</div></td>"
[83] "<td class=\"status\"><div>前</div></td>"
[84] "<td class=\"status\"><div>元</div></td>"
[85] "<td class=\"status\"><div>新</div></td>"
[86] "<td class=\"status\"><div>新</div></td>"
[87] "<td class=\"status\"><div>前</div></td>"
[88] "<td class=\"status\"><div>新</div></td>"
[89] "<td class=\"status\"><div>新</div></td>"
[90] "<td class=\"status\"><div>前</div></td>"
[91] "<td class=\"status\"><div>新</div></td>"
[92] "<td class=\"status\"><div>新</div></td>"
[93] "<td class=\"status\"><div>新</div></td>"
[94] "<td class=\"status\"><div>前</div></td>"
[95] "<td class=\"status\"><div>新</div></td>"
[96] "<td class=\"status\"><div>前</div></td>"
[97] "<td class=\"status\"><div>新</div></td>"

gsub関数を使って、候補者の「身分」データ以外の情報を消し、tokyo_status と名前をつける。

tokyo_status <- gsub("<td class=\"status\"><div>(.*)</div></td>","\\1", tokyo[status])
tokyo_status
 [1] "元" "前" "新" "新" "新" "新" "前" "新" "新" "前" "前" "新" "前" "元"
[15] "新" "新" "前" "元" "前" "前" "前" "新" "新" "前" "前" "新" "新" "前"
[29] "新" "前" "新" "新" "新" "前" "新" "新" "新" "前" "新" "前" "新" "新"
[43] "新" "前" "新" "新" "新" "前" "前" "新" "前" "新" "新" "前" "新" "新"
[57] "新" "新" "前" "前" "新" "新" "前" "前" "元" "前" "新" "新" "前" "前"
[71] "新" "前" "元" "新" "新" "前" "前" "新" "前" "前" "新" "新" "前" "元"
[85] "新" "新" "前" "新" "新" "前" "新" "新" "新" "前" "新" "前" "新"

2-8. データフレーム化する

・抜き出した 6 つのデータをデータフレームに納め df.hr.tokyo と名前をつける。

df.hr.tokyo <- data.frame(name = tokyo_name,
            age = tokyo_age,
            count = tokyo_count,
            party = tokyo_party,
            status = tokyo_status,
            previous = tokyo_previous)
df.hr.tokyo
         name age  count party status previous
1  海江田万里  68  96255  立憲     元        7
2    山田美樹  43  93234  自民     前        3
3      松沢香  39  40376  希望     新        0
4    原口実季  28   3806  諸派     新        0
5    犬丸光加  57   1570  諸派     新        0
6    又吉光雄  73   1307  諸派     新        0
7      辻清人  38 112993  自民     前        3
8    松尾明弘  42  91230  立憲     新        0
9    鳩山太郎  43  41955  希望     新        0
10   石原宏高  53 107708  自民     前        4
11     松原仁  61  94380  希望     前        7
12   香西克介  41  45088  共産     新        0
13     平将明  50 115239  自民     前        5
14   井戸正枝  51  53480  立憲     元        1
15 難波美智代  43  35352  希望     新        0
16   青山昂平  26  26037  共産     新        0
17   若宮健嗣  56 101314  自民     前        4
18   手塚仁雄  51  99182  立憲     元        4
19   福田峰之  53  45737  希望     前        3
20   落合貴之  38 100400  立憲     前        2
21   越智隆雄  53  98422  自民     前        4
22 植松恵美子  49  42862  希望     新        0
23   中岡茉妃  26   4307  諸派     新        0
24     長妻昭  57 117118  立憲     前        7
25   松本文明  68  85305  自民     前        4
26   荒木章博  64  25531  希望     新        0
27   井上郁磨  26   3850  無所     新        0
28   石原伸晃  60  99863  自民     前       10
29   吉田晴美  45  76283  立憲     新        0
30   木内孝胤  51  41175  希望     前        2
31   長内史子  29  22399  共産     新        0
32   円より子  70  11997  無所     新        0
33   斎藤郁真  29   2931  諸派     新        0
34   菅原一秀  55 122279  自民     前        6
35   高松智之  43  64731  希望     新        0
36     原純子  53  57439  共産     新        0
37   前田吉成  62   4243  無所     新        0
38   鈴木隼人  40  91146  自民     前        2
39   鈴木庸介  41  70168  立憲     新        0
40     若狭勝  60  57901  希望     前        2
41     岸良信  62  20828  共産     新        0
42     小山徹  42   2107  無所     新        0
43   吉井利光  35   1744  諸派     新        0
44   下村博文  63 104612  自民     前        8
45 前田順一郎  42  60291  立憲     新        0
46   宍戸千絵  39  42668  希望     新        0
47     小堤東  28  25426  共産     新        0
48   太田昭宏  72 112597  公明     前        8
49   池内沙織  35  83544  共産     前        1
50     中村勝  66  21892  諸派     新        0
51   鴨下一郎  68 120744  自民     前        9
52   北條智彦  34  67070  立憲     新        0
53 祖父江元希  42  30807  共産     新        0
54 松島みどり  61 104137  自民     前        6
55   矢作麻子  39  63235  希望     新        0
56   阿藤和之  46  46600  共産     新        0
57   清井美穂  54   4282  諸派     新        0
58 大塚紀久雄  76   3607  無所     新        0
59     秋元司  46 101155  自民     前        3
60   柿沢未途  46  70325  希望     前        4
61   吉田年男  69  34943  共産     新        0
62     猪野隆  52  15667  無所     新        0
63   大西英男  71  84457  自民     前        3
64   初鹿明博  48  71405  立憲     前        3
65   田村謙治  49  50568  希望     元        3
66   平沢勝栄  72 127632  自民     前        8
67   西田主税  55  49485  希望     新        0
68   新井杉生  58  43138  共産     新        0
69     菅直人  71  96713  立憲     前       13
70   土屋正忠  75  95667  自民     前        3
71     鴇田敦  51  45081  希望     新        0
72   松本洋平  44  96229  自民     前        4
73   末松義規  60  90540  立憲     元        6
74 佐々木里加  50  29743  希望     新        0
75   杉下茂雄  68  17377  共産     新        0
76   木原誠二  47 107686  自民     前        4
77     宮本徹  45  57741  共産     前        2
78     鹿野晃  44  50439  希望     新        0
79   長島昭久  55  92356  希望     前        6
80   小田原潔  53  88225  自民     前        3
81   小糸健介  35  38195  社民     新        0
82   天木直人  70   6655  諸派     新        0
83   伊藤達也  56 110493  自民     前        8
84   山花郁夫  50  91073  立憲     元        4
85 金ケ崎絵美  41  30236  希望     新        0
86     阿部真  43  22859  共産     新        0
87   小倉将信  36 110522  自民     前        3
88   伊藤俊輔  38  76450  希望     新        1
89   松村亮佑  37  58929  共産     新        0
90 萩生田光一  54 122331  自民     前        5
91   高橋斉久  44  61441  立憲     新        0
92   吉羽美華  37  39892  希望     新        0
93 飯田美弥子  57  24349  共産     新        0
94   井上信治  48 112014  自民     前        6
95   山下容子  58  44884  立憲     新        0
96   小沢鋭仁  63  38286  希望     前        8
97     井上宣  43  21031  共産     新        0
write.csv(df.hr.tokyo, "hr2017_tokyo.csv", 
          fileEncoding = "CP932")

References