1. AKB48総選挙データのスクレイピング
-1-1. Webからデータを読み込む
-1-2. 必要な情報だけ(票数)を選別する
-1-3. 必要な情報だけ(名前)を選別する
-1-4. データフレーム化する
2. 2017年総選挙データ(東京)のスクレイピング
-2-1. Webからデータを読み込む
-2-2. 必要な情報だけ(名前)を選別する
-2-3. 必要な情報だけ(年齢)を選別する
-2-4. 必要な情報だけ(票数)を選別する
-2-5. 必要な情報だけ(政党)を選別する
-2-6. 必要な情報だけ(当選回数)を選別する
-2-7. 必要な情報だけ(status)を選別する
-2-8. データフレーム化する
・Rを使うと、Webサイトから文字情報や数字情報を読み込んで、データフレームを作ることができる。
・ここでは AKB48総選挙のホームページ(http://www.akb48.co.jp/sousenkyo_45th/result.php) から、Rを使って、データを読み取ってみる。
・AKB48の総選挙のホームページ上には次のようなデータが掲示されている。
・例えば、第一位の指原莉乃さんのデータは次のように表示されている。
・上記表示に該当するソースを確認する。
・「表示」→「開発/管理」→「ソースを表示」を選ぶ。
・Rを使ってWebサイトに接続し、ソース情報を一括で取得し、result と名前をつける。
result = readLines("http://www.akb48.co.jp/sousenkyo_45th/result.php", encoding = "UTF-8")
・読み取ったデータ result の最初の8行を表示してみる。
head(result)
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
[2] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"
[3] "<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ja\" xml:lang=\"ja\" xmlns:og=\"http://ogp.me/ns#\">"
[4] "<head>"
[5] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />"
[6] "<meta http-equiv=\"Content-Style-Type\" content=\"text/css\" />"
・AKB48総選挙のホームページの「ソースを表示」を見る。
・集めたい情報(票数)は result_count の後にある。
・最初にgrep()
関数を使って、ソースの一括情報 result から得票数のデータだけを選別
→ result_count と名前をつける。
result_count = grep("result_count", result)
・うまく取り出せたか、先頭6行を確認。
head(result[result_count])
[1] "\t\t <p class=\"result_count\">243,011票</p>"
[2] "\t\t <p class=\"result_count\">175,613票</p>"
[3] "\t\t <p class=\"result_count\">112,341票</p>"
[4] "\t\t <p class=\"result_count\">110,411票</p>"
[5] "\t\t <p class=\"result_count\">92,110票</p>"
[6] "\t\t <p class=\"result_count\">78,279票</p>"
・gsub()
関数を使って、票数以外の文字を消す。
gsub(".*<p class=\"result_count\">(.*)票</p>","\\1", result[result_count])
[1] "243,011" "175,613" "112,341" "110,411" "92,110" "78,279" "69,159"
[8] "68,126" "60,591" "58,624" "58,610" "50,190" "47,094" "43,318"
[15] "40,648" "40,071" "40,011" "36,894" "33,524" "33,176" "32,886"
[22] "32,118" "31,314" "29,983" "29,517" "29,333" "29,213" "28,706"
[29] "28,553" "28,369" "28,282" "28,260" "27,487" "26,152" "25,963"
[36] "25,613" "25,039" "24,059" "23,251" "22,995" "22,429" "21,881"
[43] "21,864" "21,559" "21,009" "20,980" "20,913" "20,643" "20,618"
[50] "20,228" "20,021" "19,534" "19,377" "19,326" "19,274" "19,140"
[57] "18,524" "17,898" "16,839" "16,691" "16,548" "16,246" "15,994"
[64] "15,793" "15,716" "15,697" "15,600" "15,057" "14,950" "14,913"
[71] "14,550" "14,544" "14,177" "13,882" "13,657" "13,571" "13,512"
[78] "13,366" "13,204" "13,058"
・カンマが入っていると文字列になってしまうので
→gsub()
関数を使って、カンマを消す。
→as.integer()
関数を使って、数字を関数化する。
→関数化したものに akb_count と名前をつける。
akb_count = as.integer(gsub(",", "",
gsub(".*<p class=\"result_count\">(.*)票</p>","\\1", result[result_count])))
・うまくできたか確認。
akb_count
[1] 243011 175613 112341 110411 92110 78279 69159 68126 60591 58624
[11] 58610 50190 47094 43318 40648 40071 40011 36894 33524 33176
[21] 32886 32118 31314 29983 29517 29333 29213 28706 28553 28369
[31] 28282 28260 27487 26152 25963 25613 25039 24059 23251 22995
[41] 22429 21881 21864 21559 21009 20980 20913 20643 20618 20228
[51] 20021 19534 19377 19326 19274 19140 18524 17898 16839 16691
[61] 16548 16246 15994 15793 15716 15697 15600 15057 14950 14913
[71] 14550 14544 14177 13882 13657 13571 13512 13366 13204 13058
・同様に、メンバーの名前を取得する。
result_name = grep("result_name", result)
・うまく取り出せたか、先頭6行を確認。
head(result[result_name])
[1] "\t\t <h4 class=\"result_name\">指原 莉乃</h4>"
[2] "\t\t <h4 class=\"result_name\">渡辺 麻友</h4>"
[3] "\t\t <h4 class=\"result_name\">松井 珠理奈</h4>"
[4] "\t\t <h4 class=\"result_name\">山本 彩</h4>"
[5] "\t\t <h4 class=\"result_name\">柏木 由紀</h4>"
[6] "\t\t <h4 class=\"result_name\">宮脇 咲良</h4>"
・gsub()
関数を使って、名前以外の文字を消す。
gsub(".*<h4 class=\"result_name\">(.*)</h4>","\\1",result[result_name])
[1] "指原 莉乃" "渡辺 麻友"
[3] "松井 珠理奈" "山本 彩"
[5] "柏木 由紀" "宮脇 咲良"
[7] "須田 亜香里" "島崎 遥香"
[9] "兒玉 遥" "武藤 十夢"
[11] "横山 由依" "北原 里英"
[13] "向井地 美音" "岡田 奈々"
[15] "高橋 朱里" "にゃんにゃん仮面(小嶋陽菜)"
[17] "峯岸 みなみ" "入山 杏奈"
[19] "小嶋 真子" "高柳 明音"
[21] "込山 榛香" "大場 美奈"
[23] "朝長 美桜" "白間 美瑠"
[25] "沖田 彩華" "加藤 玲奈"
[27] "川本 紗矢" "矢吹 奈子"
[29] "古畑 奈和" "惣田 紗莉渚"
[31] "竹内 彩姫" "大島 涼花"
[33] "矢倉 楓子" "倉野尾 成美"
[35] "江籠 裕奈" "本村 碧唯"
[37] "木崎 ゆりあ" "佐々木 優佳里"
[39] "薮下 柊" "渕上 舞"
[41] "藤江 れいな" "冨吉 明日香"
[43] "田島 芽瑠" "須藤 凜々花"
[45] "田中 美久" "松岡 菜摘"
[47] "茂木 忍" "井上 由莉耶"
[49] "二村 春香" "森保 まどか"
[51] "岩立 沙穂" "太田 夢莉"
[53] "神志那 結衣" "竹内 舞"
[55] "谷 真理佳" "渋谷 凪咲"
[57] "岡田 彩花" "植木 南央"
[59] "坂口 理子" "駒田 京伽"
[61] "西野 未姫" "大和田 南那"
[63] "酒井 萌衣" "北川 綾巴"
[65] "宮前 杏実" "岸野 里香"
[67] "熊崎 晴香" "木本 花音"
[69] "谷口 めぐ" "坂口 渚沙"
[71] "山内 鈴蘭" "秋吉 優花"
[73] "大森 美優" "鎌田 菜月"
[75] "佐藤 すみれ" "加藤 美南"
[77] "吉田 朱里" "宮崎 美穂"
[79] "日高 優月" "村重 杏奈"
・先ほど、gsub()
で置換して名前だけになったデータに akb_names と名前をつける。
akb_names = gsub(".*<h4 class=\"result_name\">(.*)</h4>","\\1",result[result_name])
・うまくできたか確認。
akb_names
[1] "指原 莉乃" "渡辺 麻友"
[3] "松井 珠理奈" "山本 彩"
[5] "柏木 由紀" "宮脇 咲良"
[7] "須田 亜香里" "島崎 遥香"
[9] "兒玉 遥" "武藤 十夢"
[11] "横山 由依" "北原 里英"
[13] "向井地 美音" "岡田 奈々"
[15] "高橋 朱里" "にゃんにゃん仮面(小嶋陽菜)"
[17] "峯岸 みなみ" "入山 杏奈"
[19] "小嶋 真子" "高柳 明音"
[21] "込山 榛香" "大場 美奈"
[23] "朝長 美桜" "白間 美瑠"
[25] "沖田 彩華" "加藤 玲奈"
[27] "川本 紗矢" "矢吹 奈子"
[29] "古畑 奈和" "惣田 紗莉渚"
[31] "竹内 彩姫" "大島 涼花"
[33] "矢倉 楓子" "倉野尾 成美"
[35] "江籠 裕奈" "本村 碧唯"
[37] "木崎 ゆりあ" "佐々木 優佳里"
[39] "薮下 柊" "渕上 舞"
[41] "藤江 れいな" "冨吉 明日香"
[43] "田島 芽瑠" "須藤 凜々花"
[45] "田中 美久" "松岡 菜摘"
[47] "茂木 忍" "井上 由莉耶"
[49] "二村 春香" "森保 まどか"
[51] "岩立 沙穂" "太田 夢莉"
[53] "神志那 結衣" "竹内 舞"
[55] "谷 真理佳" "渋谷 凪咲"
[57] "岡田 彩花" "植木 南央"
[59] "坂口 理子" "駒田 京伽"
[61] "西野 未姫" "大和田 南那"
[63] "酒井 萌衣" "北川 綾巴"
[65] "宮前 杏実" "岸野 里香"
[67] "熊崎 晴香" "木本 花音"
[69] "谷口 めぐ" "坂口 渚沙"
[71] "山内 鈴蘭" "秋吉 優花"
[73] "大森 美優" "鎌田 菜月"
[75] "佐藤 すみれ" "加藤 美南"
[77] "吉田 朱里" "宮崎 美穂"
[79] "日高 優月" "村重 杏奈"
・data.frame()
関数を使って akb_names と akb_count をデータフレームに取り込み、df.akb と名前をつける。
df.akb = data.frame(akb_names, akb_count)
・うまくデータフレーム化できたか確認。
df.akb
akb_names akb_count
1 指原 莉乃 243011
2 渡辺 麻友 175613
3 松井 珠理奈 112341
4 山本 彩 110411
5 柏木 由紀 92110
6 宮脇 咲良 78279
7 須田 亜香里 69159
8 島崎 遥香 68126
9 兒玉 遥 60591
10 武藤 十夢 58624
11 横山 由依 58610
12 北原 里英 50190
13 向井地 美音 47094
14 岡田 奈々 43318
15 高橋 朱里 40648
16 にゃんにゃん仮面(小嶋陽菜) 40071
17 峯岸 みなみ 40011
18 入山 杏奈 36894
19 小嶋 真子 33524
20 高柳 明音 33176
21 込山 榛香 32886
22 大場 美奈 32118
23 朝長 美桜 31314
24 白間 美瑠 29983
25 沖田 彩華 29517
26 加藤 玲奈 29333
27 川本 紗矢 29213
28 矢吹 奈子 28706
29 古畑 奈和 28553
30 惣田 紗莉渚 28369
31 竹内 彩姫 28282
32 大島 涼花 28260
33 矢倉 楓子 27487
34 倉野尾 成美 26152
35 江籠 裕奈 25963
36 本村 碧唯 25613
37 木崎 ゆりあ 25039
38 佐々木 優佳里 24059
39 薮下 柊 23251
40 渕上 舞 22995
41 藤江 れいな 22429
42 冨吉 明日香 21881
43 田島 芽瑠 21864
44 須藤 凜々花 21559
45 田中 美久 21009
46 松岡 菜摘 20980
47 茂木 忍 20913
48 井上 由莉耶 20643
49 二村 春香 20618
50 森保 まどか 20228
51 岩立 沙穂 20021
52 太田 夢莉 19534
53 神志那 結衣 19377
54 竹内 舞 19326
55 谷 真理佳 19274
56 渋谷 凪咲 19140
57 岡田 彩花 18524
58 植木 南央 17898
59 坂口 理子 16839
60 駒田 京伽 16691
61 西野 未姫 16548
62 大和田 南那 16246
63 酒井 萌衣 15994
64 北川 綾巴 15793
65 宮前 杏実 15716
66 岸野 里香 15697
67 熊崎 晴香 15600
68 木本 花音 15057
69 谷口 めぐ 14950
70 坂口 渚沙 14913
71 山内 鈴蘭 14550
72 秋吉 優花 14544
73 大森 美優 14177
74 鎌田 菜月 13882
75 佐藤 すみれ 13657
76 加藤 美南 13571
77 吉田 朱里 13512
78 宮崎 美穂 13366
79 日高 優月 13204
80 村重 杏奈 13058
・データフレーム df.akb を csvファイルとして保存する。
write.csv(df.akb, "akb48.csv",
fileEncoding = "CP932")
・ここでは朝日新聞2017総選挙のホームページ(http://www.asahi.com/senkyo/senkyo2017/) から、Rを使って、データを読み取ってみる。
・朝日新聞2017総選挙(東京選挙区)のホームページ上で「表示」→「開発/管理」→「ソースを表示」を選ぶ。
・Rを使ってWebサイトに接続し、ソース情報を一括で取得し、tokyo と名前をつける。
tokyo = readLines("http://www.asahi.com/senkyo/senkyo2017/kaihyo/A13.html", encoding = "UTF-8")
・読み取ったデータ result の最初の8行を表示してみる。
head(tokyo)
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
[2] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"
[3] "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"ja\" lang=\"ja\" dir=\"ltr\" xmlns:og=\"http://ogp.me/ns#\" xmlns:mixi=\"http://mixi-platform.com/ns#\" xmlns:fb=\"http://www.facebook.com/2008/fbml\">"
[4] "<head>"
[5] "<!-- DTM上 -->"
[6] "<script src=\"//assets.adobedtm.com/d7e679c95b1f3fceafd1fcdf47a9b3bc7a11d039/satelliteLib-b5f070ddaa8837c4b9c5d3e0509562a889b01b07.js\"></script>"
・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者氏名は sei の後ろにあることがわかる。
・sei の後ろにある名前情報のひとかたまりを集め、sei と名付ける。
sei <- grep("\"sei\"", tokyo)
tokyo[sei]
[1] "<td class=\"namae\"><div><span class=\"sei\">海江田</span><span class=\"mei\">万里</span><span class=\"age\">(68)</span></div></td>"
[2] "<td class=\"namae\"><div><span class=\"sei\">山田</span><span class=\"mei\">美樹</span><span class=\"age\">(43)</span></div></td>"
[3] "<td class=\"namae\"><div><span class=\"sei\">松沢</span><span class=\"mei\">香</span><span class=\"age\">(39)</span></div></td>"
[4] "<td class=\"namae\"><div><span class=\"sei\">原口</span><span class=\"mei\">実季</span><span class=\"age\">(28)</span></div></td>"
[5] "<td class=\"namae\"><div><span class=\"sei\">犬丸</span><span class=\"mei\">光加</span><span class=\"age\">(57)</span></div></td>"
[6] "<td class=\"namae\"><div><span class=\"sei\">又吉</span><span class=\"mei\">光雄</span><span class=\"age\">(73)</span></div></td>"
[7] "<td class=\"namae\"><div><span class=\"sei\">辻</span><span class=\"mei\">清人</span><span class=\"age\">(38)</span></div></td>"
[8] "<td class=\"namae\"><div><span class=\"sei\">松尾</span><span class=\"mei\">明弘</span><span class=\"age\">(42)</span></div></td>"
[9] "<td class=\"namae\"><div><span class=\"sei\">鳩山</span><span class=\"mei\">太郎</span><span class=\"age\">(43)</span></div></td>"
[10] "<td class=\"namae\"><div><span class=\"sei\">石原</span><span class=\"mei\">宏高</span><span class=\"age\">(53)</span></div></td>"
[11] "<td class=\"namae\"><div><span class=\"sei\">松原</span><span class=\"mei\">仁</span><span class=\"age\">(61)</span></div></td>"
[12] "<td class=\"namae\"><div><span class=\"sei\">香西</span><span class=\"mei\">克介</span><span class=\"age\">(41)</span></div></td>"
[13] "<td class=\"namae\"><div><span class=\"sei\">平</span><span class=\"mei\">将明</span><span class=\"age\">(50)</span></div></td>"
[14] "<td class=\"namae\"><div><span class=\"sei\">井戸</span><span class=\"mei\">正枝</span><span class=\"age\">(51)</span></div></td>"
[15] "<td class=\"namae\"><div><span class=\"sei\">難波</span><span class=\"mei\">美智代</span><span class=\"age\">(43)</span></div></td>"
[16] "<td class=\"namae\"><div><span class=\"sei\">青山</span><span class=\"mei\">昂平</span><span class=\"age\">(26)</span></div></td>"
[17] "<td class=\"namae\"><div><span class=\"sei\">若宮</span><span class=\"mei\">健嗣</span><span class=\"age\">(56)</span></div></td>"
[18] "<td class=\"namae\"><div><span class=\"sei\">手塚</span><span class=\"mei\">仁雄</span><span class=\"age\">(51)</span></div></td>"
[19] "<td class=\"namae\"><div><span class=\"sei\">福田</span><span class=\"mei\">峰之</span><span class=\"age\">(53)</span></div></td>"
[20] "<td class=\"namae\"><div><span class=\"sei\">落合</span><span class=\"mei\">貴之</span><span class=\"age\">(38)</span></div></td>"
[21] "<td class=\"namae\"><div><span class=\"sei\">越智</span><span class=\"mei\">隆雄</span><span class=\"age\">(53)</span></div></td>"
[22] "<td class=\"namae\"><div><span class=\"sei\">植松</span><span class=\"mei\">恵美子</span><span class=\"age\">(49)</span></div></td>"
[23] "<td class=\"namae\"><div><span class=\"sei\">中岡</span><span class=\"mei\">茉妃</span><span class=\"age\">(26)</span></div></td>"
[24] "<td class=\"namae\"><div><span class=\"sei\">長妻</span><span class=\"mei\">昭</span><span class=\"age\">(57)</span></div></td>"
[25] "<td class=\"namae\"><div><span class=\"sei\">松本</span><span class=\"mei\">文明</span><span class=\"age\">(68)</span></div></td>"
[26] "<td class=\"namae\"><div><span class=\"sei\">荒木</span><span class=\"mei\">章博</span><span class=\"age\">(64)</span></div></td>"
[27] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">郁磨</span><span class=\"age\">(26)</span></div></td>"
[28] "<td class=\"namae\"><div><span class=\"sei\">石原</span><span class=\"mei\">伸晃</span><span class=\"age\">(60)</span></div></td>"
[29] "<td class=\"namae\"><div><span class=\"sei\">吉田</span><span class=\"mei\">晴美</span><span class=\"age\">(45)</span></div></td>"
[30] "<td class=\"namae\"><div><span class=\"sei\">木内</span><span class=\"mei\">孝胤</span><span class=\"age\">(51)</span></div></td>"
[31] "<td class=\"namae\"><div><span class=\"sei\">長内</span><span class=\"mei\">史子</span><span class=\"age\">(29)</span></div></td>"
[32] "<td class=\"namae\"><div><span class=\"sei\">円</span><span class=\"mei\">より子</span><span class=\"age\">(70)</span></div></td>"
[33] "<td class=\"namae\"><div><span class=\"sei\">斎藤</span><span class=\"mei\">郁真</span><span class=\"age\">(29)</span></div></td>"
[34] "<td class=\"namae\"><div><span class=\"sei\">菅原</span><span class=\"mei\">一秀</span><span class=\"age\">(55)</span></div></td>"
[35] "<td class=\"namae\"><div><span class=\"sei\">高松</span><span class=\"mei\">智之</span><span class=\"age\">(43)</span></div></td>"
[36] "<td class=\"namae\"><div><span class=\"sei\">原</span><span class=\"mei\">純子</span><span class=\"age\">(53)</span></div></td>"
[37] "<td class=\"namae\"><div><span class=\"sei\">前田</span><span class=\"mei\">吉成</span><span class=\"age\">(62)</span></div></td>"
[38] "<td class=\"namae\"><div><span class=\"sei\">鈴木</span><span class=\"mei\">隼人</span><span class=\"age\">(40)</span></div></td>"
[39] "<td class=\"namae\"><div><span class=\"sei\">鈴木</span><span class=\"mei\">庸介</span><span class=\"age\">(41)</span></div></td>"
[40] "<td class=\"namae\"><div><span class=\"sei\">若狭</span><span class=\"mei\">勝</span><span class=\"age\">(60)</span></div></td>"
[41] "<td class=\"namae\"><div><span class=\"sei\">岸</span><span class=\"mei\">良信</span><span class=\"age\">(62)</span></div></td>"
[42] "<td class=\"namae\"><div><span class=\"sei\">小山</span><span class=\"mei\">徹</span><span class=\"age\">(42)</span></div></td>"
[43] "<td class=\"namae\"><div><span class=\"sei\">吉井</span><span class=\"mei\">利光</span><span class=\"age\">(35)</span></div></td>"
[44] "<td class=\"namae\"><div><span class=\"sei\">下村</span><span class=\"mei\">博文</span><span class=\"age\">(63)</span></div></td>"
[45] "<td class=\"namae\"><div><span class=\"sei\">前田</span><span class=\"mei\">順一郎</span><span class=\"age\">(42)</span></div></td>"
[46] "<td class=\"namae\"><div><span class=\"sei\">宍戸</span><span class=\"mei\">千絵</span><span class=\"age\">(39)</span></div></td>"
[47] "<td class=\"namae\"><div><span class=\"sei\">小堤</span><span class=\"mei\">東</span><span class=\"age\">(28)</span></div></td>"
[48] "<td class=\"namae\"><div><span class=\"sei\">太田</span><span class=\"mei\">昭宏</span><span class=\"age\">(72)</span></div></td>"
[49] "<td class=\"namae\"><div><span class=\"sei\">池内</span><span class=\"mei\">沙織</span><span class=\"age\">(35)</span></div></td>"
[50] "<td class=\"namae\"><div><span class=\"sei\">中村</span><span class=\"mei\">勝</span><span class=\"age\">(66)</span></div></td>"
[51] "<td class=\"namae\"><div><span class=\"sei\">鴨下</span><span class=\"mei\">一郎</span><span class=\"age\">(68)</span></div></td>"
[52] "<td class=\"namae\"><div><span class=\"sei\">北條</span><span class=\"mei\">智彦</span><span class=\"age\">(34)</span></div></td>"
[53] "<td class=\"namae\"><div><span class=\"sei\">祖父江</span><span class=\"mei\">元希</span><span class=\"age\">(42)</span></div></td>"
[54] "<td class=\"namae\"><div><span class=\"sei\">松島</span><span class=\"mei\">みどり</span><span class=\"age\">(61)</span></div></td>"
[55] "<td class=\"namae\"><div><span class=\"sei\">矢作</span><span class=\"mei\">麻子</span><span class=\"age\">(39)</span></div></td>"
[56] "<td class=\"namae\"><div><span class=\"sei\">阿藤</span><span class=\"mei\">和之</span><span class=\"age\">(46)</span></div></td>"
[57] "<td class=\"namae\"><div><span class=\"sei\">清井</span><span class=\"mei\">美穂</span><span class=\"age\">(54)</span></div></td>"
[58] "<td class=\"namae\"><div><span class=\"sei\">大塚</span><span class=\"mei\">紀久雄</span><span class=\"age\">(76)</span></div></td>"
[59] "<td class=\"namae\"><div><span class=\"sei\">秋元</span><span class=\"mei\">司</span><span class=\"age\">(46)</span></div></td>"
[60] "<td class=\"namae\"><div><span class=\"sei\">柿沢</span><span class=\"mei\">未途</span><span class=\"age\">(46)</span></div></td>"
[61] "<td class=\"namae\"><div><span class=\"sei\">吉田</span><span class=\"mei\">年男</span><span class=\"age\">(69)</span></div></td>"
[62] "<td class=\"namae\"><div><span class=\"sei\">猪野</span><span class=\"mei\">隆</span><span class=\"age\">(52)</span></div></td>"
[63] "<td class=\"namae\"><div><span class=\"sei\">大西</span><span class=\"mei\">英男</span><span class=\"age\">(71)</span></div></td>"
[64] "<td class=\"namae\"><div><span class=\"sei\">初鹿</span><span class=\"mei\">明博</span><span class=\"age\">(48)</span></div></td>"
[65] "<td class=\"namae\"><div><span class=\"sei\">田村</span><span class=\"mei\">謙治</span><span class=\"age\">(49)</span></div></td>"
[66] "<td class=\"namae\"><div><span class=\"sei\">平沢</span><span class=\"mei\">勝栄</span><span class=\"age\">(72)</span></div></td>"
[67] "<td class=\"namae\"><div><span class=\"sei\">西田</span><span class=\"mei\">主税</span><span class=\"age\">(55)</span></div></td>"
[68] "<td class=\"namae\"><div><span class=\"sei\">新井</span><span class=\"mei\">杉生</span><span class=\"age\">(58)</span></div></td>"
[69] "<td class=\"namae\"><div><span class=\"sei\">菅</span><span class=\"mei\">直人</span><span class=\"age\">(71)</span></div></td>"
[70] "<td class=\"namae\"><div><span class=\"sei\">土屋</span><span class=\"mei\">正忠</span><span class=\"age\">(75)</span></div></td>"
[71] "<td class=\"namae\"><div><span class=\"sei\">鴇田</span><span class=\"mei\">敦</span><span class=\"age\">(51)</span></div></td>"
[72] "<td class=\"namae\"><div><span class=\"sei\">松本</span><span class=\"mei\">洋平</span><span class=\"age\">(44)</span></div></td>"
[73] "<td class=\"namae\"><div><span class=\"sei\">末松</span><span class=\"mei\">義規</span><span class=\"age\">(60)</span></div></td>"
[74] "<td class=\"namae\"><div><span class=\"sei\">佐々木</span><span class=\"mei\">里加</span><span class=\"age\">(50)</span></div></td>"
[75] "<td class=\"namae\"><div><span class=\"sei\">杉下</span><span class=\"mei\">茂雄</span><span class=\"age\">(68)</span></div></td>"
[76] "<td class=\"namae\"><div><span class=\"sei\">木原</span><span class=\"mei\">誠二</span><span class=\"age\">(47)</span></div></td>"
[77] "<td class=\"namae\"><div><span class=\"sei\">宮本</span><span class=\"mei\">徹</span><span class=\"age\">(45)</span></div></td>"
[78] "<td class=\"namae\"><div><span class=\"sei\">鹿野</span><span class=\"mei\">晃</span><span class=\"age\">(44)</span></div></td>"
[79] "<td class=\"namae\"><div><span class=\"sei\">長島</span><span class=\"mei\">昭久</span><span class=\"age\">(55)</span></div></td>"
[80] "<td class=\"namae\"><div><span class=\"sei\">小田原</span><span class=\"mei\">潔</span><span class=\"age\">(53)</span></div></td>"
[81] "<td class=\"namae\"><div><span class=\"sei\">小糸</span><span class=\"mei\">健介</span><span class=\"age\">(35)</span></div></td>"
[82] "<td class=\"namae\"><div><span class=\"sei\">天木</span><span class=\"mei\">直人</span><span class=\"age\">(70)</span></div></td>"
[83] "<td class=\"namae\"><div><span class=\"sei\">伊藤</span><span class=\"mei\">達也</span><span class=\"age\">(56)</span></div></td>"
[84] "<td class=\"namae\"><div><span class=\"sei\">山花</span><span class=\"mei\">郁夫</span><span class=\"age\">(50)</span></div></td>"
[85] "<td class=\"namae\"><div><span class=\"sei\">金ケ崎</span><span class=\"mei\">絵美</span><span class=\"age\">(41)</span></div></td>"
[86] "<td class=\"namae\"><div><span class=\"sei\">阿部</span><span class=\"mei\">真</span><span class=\"age\">(43)</span></div></td>"
[87] "<td class=\"namae\"><div><span class=\"sei\">小倉</span><span class=\"mei\">将信</span><span class=\"age\">(36)</span></div></td>"
[88] "<td class=\"namae\"><div><span class=\"sei\">伊藤</span><span class=\"mei\">俊輔</span><span class=\"age\">(38)</span></div></td>"
[89] "<td class=\"namae\"><div><span class=\"sei\">松村</span><span class=\"mei\">亮佑</span><span class=\"age\">(37)</span></div></td>"
[90] "<td class=\"namae\"><div><span class=\"sei\">萩生田</span><span class=\"mei\">光一</span><span class=\"age\">(54)</span></div></td>"
[91] "<td class=\"namae\"><div><span class=\"sei\">高橋</span><span class=\"mei\">斉久</span><span class=\"age\">(44)</span></div></td>"
[92] "<td class=\"namae\"><div><span class=\"sei\">吉羽</span><span class=\"mei\">美華</span><span class=\"age\">(37)</span></div></td>"
[93] "<td class=\"namae\"><div><span class=\"sei\">飯田</span><span class=\"mei\">美弥子</span><span class=\"age\">(57)</span></div></td>"
[94] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">信治</span><span class=\"age\">(48)</span></div></td>"
[95] "<td class=\"namae\"><div><span class=\"sei\">山下</span><span class=\"mei\">容子</span><span class=\"age\">(58)</span></div></td>"
[96] "<td class=\"namae\"><div><span class=\"sei\">小沢</span><span class=\"mei\">鋭仁</span><span class=\"age\">(63)</span></div></td>"
[97] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">宣</span><span class=\"age\">(43)</span></div></td>"
・gsub
関数を使って、名前以外の文字を消す。
tokyo_name <- gsub("<td class=\"namae\"><div><span class=\"sei\">(.*)</span><span class=\"mei\">(.*)</span><span class=\"age\">(.*)</span></div></td>","\\1, \\2, \\3, \\4", tokyo[sei])
tokyo_name
[1] "海江田, 万里, (68), " "山田, 美樹, (43), " "松沢, 香, (39), "
[4] "原口, 実季, (28), " "犬丸, 光加, (57), " "又吉, 光雄, (73), "
[7] "辻, 清人, (38), " "松尾, 明弘, (42), " "鳩山, 太郎, (43), "
[10] "石原, 宏高, (53), " "松原, 仁, (61), " "香西, 克介, (41), "
[13] "平, 将明, (50), " "井戸, 正枝, (51), " "難波, 美智代, (43), "
[16] "青山, 昂平, (26), " "若宮, 健嗣, (56), " "手塚, 仁雄, (51), "
[19] "福田, 峰之, (53), " "落合, 貴之, (38), " "越智, 隆雄, (53), "
[22] "植松, 恵美子, (49), " "中岡, 茉妃, (26), " "長妻, 昭, (57), "
[25] "松本, 文明, (68), " "荒木, 章博, (64), " "井上, 郁磨, (26), "
[28] "石原, 伸晃, (60), " "吉田, 晴美, (45), " "木内, 孝胤, (51), "
[31] "長内, 史子, (29), " "円, より子, (70), " "斎藤, 郁真, (29), "
[34] "菅原, 一秀, (55), " "高松, 智之, (43), " "原, 純子, (53), "
[37] "前田, 吉成, (62), " "鈴木, 隼人, (40), " "鈴木, 庸介, (41), "
[40] "若狭, 勝, (60), " "岸, 良信, (62), " "小山, 徹, (42), "
[43] "吉井, 利光, (35), " "下村, 博文, (63), " "前田, 順一郎, (42), "
[46] "宍戸, 千絵, (39), " "小堤, 東, (28), " "太田, 昭宏, (72), "
[49] "池内, 沙織, (35), " "中村, 勝, (66), " "鴨下, 一郎, (68), "
[52] "北條, 智彦, (34), " "祖父江, 元希, (42), " "松島, みどり, (61), "
[55] "矢作, 麻子, (39), " "阿藤, 和之, (46), " "清井, 美穂, (54), "
[58] "大塚, 紀久雄, (76), " "秋元, 司, (46), " "柿沢, 未途, (46), "
[61] "吉田, 年男, (69), " "猪野, 隆, (52), " "大西, 英男, (71), "
[64] "初鹿, 明博, (48), " "田村, 謙治, (49), " "平沢, 勝栄, (72), "
[67] "西田, 主税, (55), " "新井, 杉生, (58), " "菅, 直人, (71), "
[70] "土屋, 正忠, (75), " "鴇田, 敦, (51), " "松本, 洋平, (44), "
[73] "末松, 義規, (60), " "佐々木, 里加, (50), " "杉下, 茂雄, (68), "
[76] "木原, 誠二, (47), " "宮本, 徹, (45), " "鹿野, 晃, (44), "
[79] "長島, 昭久, (55), " "小田原, 潔, (53), " "小糸, 健介, (35), "
[82] "天木, 直人, (70), " "伊藤, 達也, (56), " "山花, 郁夫, (50), "
[85] "金ケ崎, 絵美, (41), " "阿部, 真, (43), " "小倉, 将信, (36), "
[88] "伊藤, 俊輔, (38), " "松村, 亮佑, (37), " "萩生田, 光一, (54), "
[91] "高橋, 斉久, (44), " "吉羽, 美華, (37), " "飯田, 美弥子, (57), "
[94] "井上, 信治, (48), " "山下, 容子, (58), " "小沢, 鋭仁, (63), "
[97] "井上, 宣, (43), "
・上記の情報には名前以外の余計な情報(カッコやカンマなど)が残る。
→ 下記の作業を行うことで、余計な文字列を置換して消す。
・gsub
関数を使って、カッコ()を消す
tokyo_name <- gsub("[()]","",tokyo_name)
tokyo_name
[1] "海江田, 万里, 68, " "山田, 美樹, 43, " "松沢, 香, 39, "
[4] "原口, 実季, 28, " "犬丸, 光加, 57, " "又吉, 光雄, 73, "
[7] "辻, 清人, 38, " "松尾, 明弘, 42, " "鳩山, 太郎, 43, "
[10] "石原, 宏高, 53, " "松原, 仁, 61, " "香西, 克介, 41, "
[13] "平, 将明, 50, " "井戸, 正枝, 51, " "難波, 美智代, 43, "
[16] "青山, 昂平, 26, " "若宮, 健嗣, 56, " "手塚, 仁雄, 51, "
[19] "福田, 峰之, 53, " "落合, 貴之, 38, " "越智, 隆雄, 53, "
[22] "植松, 恵美子, 49, " "中岡, 茉妃, 26, " "長妻, 昭, 57, "
[25] "松本, 文明, 68, " "荒木, 章博, 64, " "井上, 郁磨, 26, "
[28] "石原, 伸晃, 60, " "吉田, 晴美, 45, " "木内, 孝胤, 51, "
[31] "長内, 史子, 29, " "円, より子, 70, " "斎藤, 郁真, 29, "
[34] "菅原, 一秀, 55, " "高松, 智之, 43, " "原, 純子, 53, "
[37] "前田, 吉成, 62, " "鈴木, 隼人, 40, " "鈴木, 庸介, 41, "
[40] "若狭, 勝, 60, " "岸, 良信, 62, " "小山, 徹, 42, "
[43] "吉井, 利光, 35, " "下村, 博文, 63, " "前田, 順一郎, 42, "
[46] "宍戸, 千絵, 39, " "小堤, 東, 28, " "太田, 昭宏, 72, "
[49] "池内, 沙織, 35, " "中村, 勝, 66, " "鴨下, 一郎, 68, "
[52] "北條, 智彦, 34, " "祖父江, 元希, 42, " "松島, みどり, 61, "
[55] "矢作, 麻子, 39, " "阿藤, 和之, 46, " "清井, 美穂, 54, "
[58] "大塚, 紀久雄, 76, " "秋元, 司, 46, " "柿沢, 未途, 46, "
[61] "吉田, 年男, 69, " "猪野, 隆, 52, " "大西, 英男, 71, "
[64] "初鹿, 明博, 48, " "田村, 謙治, 49, " "平沢, 勝栄, 72, "
[67] "西田, 主税, 55, " "新井, 杉生, 58, " "菅, 直人, 71, "
[70] "土屋, 正忠, 75, " "鴇田, 敦, 51, " "松本, 洋平, 44, "
[73] "末松, 義規, 60, " "佐々木, 里加, 50, " "杉下, 茂雄, 68, "
[76] "木原, 誠二, 47, " "宮本, 徹, 45, " "鹿野, 晃, 44, "
[79] "長島, 昭久, 55, " "小田原, 潔, 53, " "小糸, 健介, 35, "
[82] "天木, 直人, 70, " "伊藤, 達也, 56, " "山花, 郁夫, 50, "
[85] "金ケ崎, 絵美, 41, " "阿部, 真, 43, " "小倉, 将信, 36, "
[88] "伊藤, 俊輔, 38, " "松村, 亮佑, 37, " "萩生田, 光一, 54, "
[91] "高橋, 斉久, 44, " "吉羽, 美華, 37, " "飯田, 美弥子, 57, "
[94] "井上, 信治, 48, " "山下, 容子, 58, " "小沢, 鋭仁, 63, "
[97] "井上, 宣, 43, "
・gsub
関数を使って、カンマを消す。
tokyo_name <- gsub(",","",tokyo_name)
tokyo_name
[1] "海江田 万里 68 " "山田 美樹 43 " "松沢 香 39 "
[4] "原口 実季 28 " "犬丸 光加 57 " "又吉 光雄 73 "
[7] "辻 清人 38 " "松尾 明弘 42 " "鳩山 太郎 43 "
[10] "石原 宏高 53 " "松原 仁 61 " "香西 克介 41 "
[13] "平 将明 50 " "井戸 正枝 51 " "難波 美智代 43 "
[16] "青山 昂平 26 " "若宮 健嗣 56 " "手塚 仁雄 51 "
[19] "福田 峰之 53 " "落合 貴之 38 " "越智 隆雄 53 "
[22] "植松 恵美子 49 " "中岡 茉妃 26 " "長妻 昭 57 "
[25] "松本 文明 68 " "荒木 章博 64 " "井上 郁磨 26 "
[28] "石原 伸晃 60 " "吉田 晴美 45 " "木内 孝胤 51 "
[31] "長内 史子 29 " "円 より子 70 " "斎藤 郁真 29 "
[34] "菅原 一秀 55 " "高松 智之 43 " "原 純子 53 "
[37] "前田 吉成 62 " "鈴木 隼人 40 " "鈴木 庸介 41 "
[40] "若狭 勝 60 " "岸 良信 62 " "小山 徹 42 "
[43] "吉井 利光 35 " "下村 博文 63 " "前田 順一郎 42 "
[46] "宍戸 千絵 39 " "小堤 東 28 " "太田 昭宏 72 "
[49] "池内 沙織 35 " "中村 勝 66 " "鴨下 一郎 68 "
[52] "北條 智彦 34 " "祖父江 元希 42 " "松島 みどり 61 "
[55] "矢作 麻子 39 " "阿藤 和之 46 " "清井 美穂 54 "
[58] "大塚 紀久雄 76 " "秋元 司 46 " "柿沢 未途 46 "
[61] "吉田 年男 69 " "猪野 隆 52 " "大西 英男 71 "
[64] "初鹿 明博 48 " "田村 謙治 49 " "平沢 勝栄 72 "
[67] "西田 主税 55 " "新井 杉生 58 " "菅 直人 71 "
[70] "土屋 正忠 75 " "鴇田 敦 51 " "松本 洋平 44 "
[73] "末松 義規 60 " "佐々木 里加 50 " "杉下 茂雄 68 "
[76] "木原 誠二 47 " "宮本 徹 45 " "鹿野 晃 44 "
[79] "長島 昭久 55 " "小田原 潔 53 " "小糸 健介 35 "
[82] "天木 直人 70 " "伊藤 達也 56 " "山花 郁夫 50 "
[85] "金ケ崎 絵美 41 " "阿部 真 43 " "小倉 将信 36 "
[88] "伊藤 俊輔 38 " "松村 亮佑 37 " "萩生田 光一 54 "
[91] "高橋 斉久 44 " "吉羽 美華 37 " "飯田 美弥子 57 "
[94] "井上 信治 48 " "山下 容子 58 " "小沢 鋭仁 63 "
[97] "井上 宣 43 "
・gsub
関数を使って、名前の後ろに残ってしまった、年齢の数字を置換して消す。
tokyo_name <- gsub("[0-99]+","",tokyo_name)
tokyo_name
[1] "海江田 万里 " "山田 美樹 " "松沢 香 " "原口 実季 "
[5] "犬丸 光加 " "又吉 光雄 " "辻 清人 " "松尾 明弘 "
[9] "鳩山 太郎 " "石原 宏高 " "松原 仁 " "香西 克介 "
[13] "平 将明 " "井戸 正枝 " "難波 美智代 " "青山 昂平 "
[17] "若宮 健嗣 " "手塚 仁雄 " "福田 峰之 " "落合 貴之 "
[21] "越智 隆雄 " "植松 恵美子 " "中岡 茉妃 " "長妻 昭 "
[25] "松本 文明 " "荒木 章博 " "井上 郁磨 " "石原 伸晃 "
[29] "吉田 晴美 " "木内 孝胤 " "長内 史子 " "円 より子 "
[33] "斎藤 郁真 " "菅原 一秀 " "高松 智之 " "原 純子 "
[37] "前田 吉成 " "鈴木 隼人 " "鈴木 庸介 " "若狭 勝 "
[41] "岸 良信 " "小山 徹 " "吉井 利光 " "下村 博文 "
[45] "前田 順一郎 " "宍戸 千絵 " "小堤 東 " "太田 昭宏 "
[49] "池内 沙織 " "中村 勝 " "鴨下 一郎 " "北條 智彦 "
[53] "祖父江 元希 " "松島 みどり " "矢作 麻子 " "阿藤 和之 "
[57] "清井 美穂 " "大塚 紀久雄 " "秋元 司 " "柿沢 未途 "
[61] "吉田 年男 " "猪野 隆 " "大西 英男 " "初鹿 明博 "
[65] "田村 謙治 " "平沢 勝栄 " "西田 主税 " "新井 杉生 "
[69] "菅 直人 " "土屋 正忠 " "鴇田 敦 " "松本 洋平 "
[73] "末松 義規 " "佐々木 里加 " "杉下 茂雄 " "木原 誠二 "
[77] "宮本 徹 " "鹿野 晃 " "長島 昭久 " "小田原 潔 "
[81] "小糸 健介 " "天木 直人 " "伊藤 達也 " "山花 郁夫 "
[85] "金ケ崎 絵美 " "阿部 真 " "小倉 将信 " "伊藤 俊輔 "
[89] "松村 亮佑 " "萩生田 光一 " "高橋 斉久 " "吉羽 美華 "
[93] "飯田 美弥子 " "井上 信治 " "山下 容子 " "小沢 鋭仁 "
[97] "井上 宣 "
・gsub
関数を使って、名前と数字の間の半角スペース 2 スペース分を置換して消す。
tokyo_name <- gsub(" ","",tokyo_name)
tokyo_name
## [1] "海江田 万里" "山田 美樹" "松沢 香" "原口 実季" "犬丸 光加"
## [6] "又吉 光雄" "辻 清人" "松尾 明弘" "鳩山 太郎" "石原 宏高"
## [11] "松原 仁" "香西 克介" "平 将明" "井戸 正枝" "難波 美智代"
## [16] "青山 昂平" "若宮 健嗣" "手塚 仁雄" "福田 峰之" "落合 貴之"
## [21] "越智 隆雄" "植松 恵美子" "中岡 茉妃" "長妻 昭" "松本 文明"
## [26] "荒木 章博" "井上 郁磨" "石原 伸晃" "吉田 晴美" "木内 孝胤"
## [31] "長内 史子" "円 より子" "斎藤 郁真" "菅原 一秀" "高松 智之"
## [36] "原 純子" "前田 吉成" "鈴木 隼人" "鈴木 庸介" "若狭 勝"
## [41] "岸 良信" "小山 徹" "吉井 利光" "下村 博文" "前田 順一郎"
## [46] "宍戸 千絵" "小堤 東" "太田 昭宏" "池内 沙織" "中村 勝"
## [51] "鴨下 一郎" "北條 智彦" "祖父江 元希" "松島 みどり" "矢作 麻子"
## [56] "阿藤 和之" "清井 美穂" "大塚 紀久雄" "秋元 司" "柿沢 未途"
## [61] "吉田 年男" "猪野 隆" "大西 英男" "初鹿 明博" "田村 謙治"
## [66] "平沢 勝栄" "西田 主税" "新井 杉生" "菅 直人" "土屋 正忠"
## [71] "鴇田 敦" "松本 洋平" "末松 義規" "佐々木 里加" "杉下 茂雄"
## [76] "木原 誠二" "宮本 徹" "鹿野 晃" "長島 昭久" "小田原 潔"
## [81] "小糸 健介" "天木 直人" "伊藤 達也" "山花 郁夫" "金ケ崎 絵美"
## [86] "阿部 真" "小倉 将信" "伊藤 俊輔" "松村 亮佑" "萩生田 光一"
## [91] "高橋 斉久" "吉羽 美華" "飯田 美弥子" "井上 信治" "山下 容子"
## [96] "小沢 鋭仁" "井上 宣"
・gsub
関数を使って、氏と名の間の半角スペースを置換して消す。
tokyo_name <- gsub(" ","",tokyo_name)
tokyo_name
## [1] "海江田万里" "山田美樹" "松沢香" "原口実季" "犬丸光加"
## [6] "又吉光雄" "辻清人" "松尾明弘" "鳩山太郎" "石原宏高"
## [11] "松原仁" "香西克介" "平将明" "井戸正枝" "難波美智代"
## [16] "青山昂平" "若宮健嗣" "手塚仁雄" "福田峰之" "落合貴之"
## [21] "越智隆雄" "植松恵美子" "中岡茉妃" "長妻昭" "松本文明"
## [26] "荒木章博" "井上郁磨" "石原伸晃" "吉田晴美" "木内孝胤"
## [31] "長内史子" "円より子" "斎藤郁真" "菅原一秀" "高松智之"
## [36] "原純子" "前田吉成" "鈴木隼人" "鈴木庸介" "若狭勝"
## [41] "岸良信" "小山徹" "吉井利光" "下村博文" "前田順一郎"
## [46] "宍戸千絵" "小堤東" "太田昭宏" "池内沙織" "中村勝"
## [51] "鴨下一郎" "北條智彦" "祖父江元希" "松島みどり" "矢作麻子"
## [56] "阿藤和之" "清井美穂" "大塚紀久雄" "秋元司" "柿沢未途"
## [61] "吉田年男" "猪野隆" "大西英男" "初鹿明博" "田村謙治"
## [66] "平沢勝栄" "西田主税" "新井杉生" "菅直人" "土屋正忠"
## [71] "鴇田敦" "松本洋平" "末松義規" "佐々木里加" "杉下茂雄"
## [76] "木原誠二" "宮本徹" "鹿野晃" "長島昭久" "小田原潔"
## [81] "小糸健介" "天木直人" "伊藤達也" "山花郁夫" "金ケ崎絵美"
## [86] "阿部真" "小倉将信" "伊藤俊輔" "松村亮佑" "萩生田光一"
## [91] "高橋斉久" "吉羽美華" "飯田美弥子" "井上信治" "山下容子"
## [96] "小沢鋭仁" "井上宣"
・これで、氏名の情報を整えることができた。
・年齢のデータを集める。 ・年齢の数字は、候補者の氏名の後ろにあることがわかっている。
・先ほど同様、“"sei"”
近辺のソースを集め、ageと名付ける。
age <- grep("\"sei\"", tokyo)
tokyo[age]
[1] "<td class=\"namae\"><div><span class=\"sei\">海江田</span><span class=\"mei\">万里</span><span class=\"age\">(68)</span></div></td>"
[2] "<td class=\"namae\"><div><span class=\"sei\">山田</span><span class=\"mei\">美樹</span><span class=\"age\">(43)</span></div></td>"
[3] "<td class=\"namae\"><div><span class=\"sei\">松沢</span><span class=\"mei\">香</span><span class=\"age\">(39)</span></div></td>"
[4] "<td class=\"namae\"><div><span class=\"sei\">原口</span><span class=\"mei\">実季</span><span class=\"age\">(28)</span></div></td>"
[5] "<td class=\"namae\"><div><span class=\"sei\">犬丸</span><span class=\"mei\">光加</span><span class=\"age\">(57)</span></div></td>"
[6] "<td class=\"namae\"><div><span class=\"sei\">又吉</span><span class=\"mei\">光雄</span><span class=\"age\">(73)</span></div></td>"
[7] "<td class=\"namae\"><div><span class=\"sei\">辻</span><span class=\"mei\">清人</span><span class=\"age\">(38)</span></div></td>"
[8] "<td class=\"namae\"><div><span class=\"sei\">松尾</span><span class=\"mei\">明弘</span><span class=\"age\">(42)</span></div></td>"
[9] "<td class=\"namae\"><div><span class=\"sei\">鳩山</span><span class=\"mei\">太郎</span><span class=\"age\">(43)</span></div></td>"
[10] "<td class=\"namae\"><div><span class=\"sei\">石原</span><span class=\"mei\">宏高</span><span class=\"age\">(53)</span></div></td>"
[11] "<td class=\"namae\"><div><span class=\"sei\">松原</span><span class=\"mei\">仁</span><span class=\"age\">(61)</span></div></td>"
[12] "<td class=\"namae\"><div><span class=\"sei\">香西</span><span class=\"mei\">克介</span><span class=\"age\">(41)</span></div></td>"
[13] "<td class=\"namae\"><div><span class=\"sei\">平</span><span class=\"mei\">将明</span><span class=\"age\">(50)</span></div></td>"
[14] "<td class=\"namae\"><div><span class=\"sei\">井戸</span><span class=\"mei\">正枝</span><span class=\"age\">(51)</span></div></td>"
[15] "<td class=\"namae\"><div><span class=\"sei\">難波</span><span class=\"mei\">美智代</span><span class=\"age\">(43)</span></div></td>"
[16] "<td class=\"namae\"><div><span class=\"sei\">青山</span><span class=\"mei\">昂平</span><span class=\"age\">(26)</span></div></td>"
[17] "<td class=\"namae\"><div><span class=\"sei\">若宮</span><span class=\"mei\">健嗣</span><span class=\"age\">(56)</span></div></td>"
[18] "<td class=\"namae\"><div><span class=\"sei\">手塚</span><span class=\"mei\">仁雄</span><span class=\"age\">(51)</span></div></td>"
[19] "<td class=\"namae\"><div><span class=\"sei\">福田</span><span class=\"mei\">峰之</span><span class=\"age\">(53)</span></div></td>"
[20] "<td class=\"namae\"><div><span class=\"sei\">落合</span><span class=\"mei\">貴之</span><span class=\"age\">(38)</span></div></td>"
[21] "<td class=\"namae\"><div><span class=\"sei\">越智</span><span class=\"mei\">隆雄</span><span class=\"age\">(53)</span></div></td>"
[22] "<td class=\"namae\"><div><span class=\"sei\">植松</span><span class=\"mei\">恵美子</span><span class=\"age\">(49)</span></div></td>"
[23] "<td class=\"namae\"><div><span class=\"sei\">中岡</span><span class=\"mei\">茉妃</span><span class=\"age\">(26)</span></div></td>"
[24] "<td class=\"namae\"><div><span class=\"sei\">長妻</span><span class=\"mei\">昭</span><span class=\"age\">(57)</span></div></td>"
[25] "<td class=\"namae\"><div><span class=\"sei\">松本</span><span class=\"mei\">文明</span><span class=\"age\">(68)</span></div></td>"
[26] "<td class=\"namae\"><div><span class=\"sei\">荒木</span><span class=\"mei\">章博</span><span class=\"age\">(64)</span></div></td>"
[27] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">郁磨</span><span class=\"age\">(26)</span></div></td>"
[28] "<td class=\"namae\"><div><span class=\"sei\">石原</span><span class=\"mei\">伸晃</span><span class=\"age\">(60)</span></div></td>"
[29] "<td class=\"namae\"><div><span class=\"sei\">吉田</span><span class=\"mei\">晴美</span><span class=\"age\">(45)</span></div></td>"
[30] "<td class=\"namae\"><div><span class=\"sei\">木内</span><span class=\"mei\">孝胤</span><span class=\"age\">(51)</span></div></td>"
[31] "<td class=\"namae\"><div><span class=\"sei\">長内</span><span class=\"mei\">史子</span><span class=\"age\">(29)</span></div></td>"
[32] "<td class=\"namae\"><div><span class=\"sei\">円</span><span class=\"mei\">より子</span><span class=\"age\">(70)</span></div></td>"
[33] "<td class=\"namae\"><div><span class=\"sei\">斎藤</span><span class=\"mei\">郁真</span><span class=\"age\">(29)</span></div></td>"
[34] "<td class=\"namae\"><div><span class=\"sei\">菅原</span><span class=\"mei\">一秀</span><span class=\"age\">(55)</span></div></td>"
[35] "<td class=\"namae\"><div><span class=\"sei\">高松</span><span class=\"mei\">智之</span><span class=\"age\">(43)</span></div></td>"
[36] "<td class=\"namae\"><div><span class=\"sei\">原</span><span class=\"mei\">純子</span><span class=\"age\">(53)</span></div></td>"
[37] "<td class=\"namae\"><div><span class=\"sei\">前田</span><span class=\"mei\">吉成</span><span class=\"age\">(62)</span></div></td>"
[38] "<td class=\"namae\"><div><span class=\"sei\">鈴木</span><span class=\"mei\">隼人</span><span class=\"age\">(40)</span></div></td>"
[39] "<td class=\"namae\"><div><span class=\"sei\">鈴木</span><span class=\"mei\">庸介</span><span class=\"age\">(41)</span></div></td>"
[40] "<td class=\"namae\"><div><span class=\"sei\">若狭</span><span class=\"mei\">勝</span><span class=\"age\">(60)</span></div></td>"
[41] "<td class=\"namae\"><div><span class=\"sei\">岸</span><span class=\"mei\">良信</span><span class=\"age\">(62)</span></div></td>"
[42] "<td class=\"namae\"><div><span class=\"sei\">小山</span><span class=\"mei\">徹</span><span class=\"age\">(42)</span></div></td>"
[43] "<td class=\"namae\"><div><span class=\"sei\">吉井</span><span class=\"mei\">利光</span><span class=\"age\">(35)</span></div></td>"
[44] "<td class=\"namae\"><div><span class=\"sei\">下村</span><span class=\"mei\">博文</span><span class=\"age\">(63)</span></div></td>"
[45] "<td class=\"namae\"><div><span class=\"sei\">前田</span><span class=\"mei\">順一郎</span><span class=\"age\">(42)</span></div></td>"
[46] "<td class=\"namae\"><div><span class=\"sei\">宍戸</span><span class=\"mei\">千絵</span><span class=\"age\">(39)</span></div></td>"
[47] "<td class=\"namae\"><div><span class=\"sei\">小堤</span><span class=\"mei\">東</span><span class=\"age\">(28)</span></div></td>"
[48] "<td class=\"namae\"><div><span class=\"sei\">太田</span><span class=\"mei\">昭宏</span><span class=\"age\">(72)</span></div></td>"
[49] "<td class=\"namae\"><div><span class=\"sei\">池内</span><span class=\"mei\">沙織</span><span class=\"age\">(35)</span></div></td>"
[50] "<td class=\"namae\"><div><span class=\"sei\">中村</span><span class=\"mei\">勝</span><span class=\"age\">(66)</span></div></td>"
[51] "<td class=\"namae\"><div><span class=\"sei\">鴨下</span><span class=\"mei\">一郎</span><span class=\"age\">(68)</span></div></td>"
[52] "<td class=\"namae\"><div><span class=\"sei\">北條</span><span class=\"mei\">智彦</span><span class=\"age\">(34)</span></div></td>"
[53] "<td class=\"namae\"><div><span class=\"sei\">祖父江</span><span class=\"mei\">元希</span><span class=\"age\">(42)</span></div></td>"
[54] "<td class=\"namae\"><div><span class=\"sei\">松島</span><span class=\"mei\">みどり</span><span class=\"age\">(61)</span></div></td>"
[55] "<td class=\"namae\"><div><span class=\"sei\">矢作</span><span class=\"mei\">麻子</span><span class=\"age\">(39)</span></div></td>"
[56] "<td class=\"namae\"><div><span class=\"sei\">阿藤</span><span class=\"mei\">和之</span><span class=\"age\">(46)</span></div></td>"
[57] "<td class=\"namae\"><div><span class=\"sei\">清井</span><span class=\"mei\">美穂</span><span class=\"age\">(54)</span></div></td>"
[58] "<td class=\"namae\"><div><span class=\"sei\">大塚</span><span class=\"mei\">紀久雄</span><span class=\"age\">(76)</span></div></td>"
[59] "<td class=\"namae\"><div><span class=\"sei\">秋元</span><span class=\"mei\">司</span><span class=\"age\">(46)</span></div></td>"
[60] "<td class=\"namae\"><div><span class=\"sei\">柿沢</span><span class=\"mei\">未途</span><span class=\"age\">(46)</span></div></td>"
[61] "<td class=\"namae\"><div><span class=\"sei\">吉田</span><span class=\"mei\">年男</span><span class=\"age\">(69)</span></div></td>"
[62] "<td class=\"namae\"><div><span class=\"sei\">猪野</span><span class=\"mei\">隆</span><span class=\"age\">(52)</span></div></td>"
[63] "<td class=\"namae\"><div><span class=\"sei\">大西</span><span class=\"mei\">英男</span><span class=\"age\">(71)</span></div></td>"
[64] "<td class=\"namae\"><div><span class=\"sei\">初鹿</span><span class=\"mei\">明博</span><span class=\"age\">(48)</span></div></td>"
[65] "<td class=\"namae\"><div><span class=\"sei\">田村</span><span class=\"mei\">謙治</span><span class=\"age\">(49)</span></div></td>"
[66] "<td class=\"namae\"><div><span class=\"sei\">平沢</span><span class=\"mei\">勝栄</span><span class=\"age\">(72)</span></div></td>"
[67] "<td class=\"namae\"><div><span class=\"sei\">西田</span><span class=\"mei\">主税</span><span class=\"age\">(55)</span></div></td>"
[68] "<td class=\"namae\"><div><span class=\"sei\">新井</span><span class=\"mei\">杉生</span><span class=\"age\">(58)</span></div></td>"
[69] "<td class=\"namae\"><div><span class=\"sei\">菅</span><span class=\"mei\">直人</span><span class=\"age\">(71)</span></div></td>"
[70] "<td class=\"namae\"><div><span class=\"sei\">土屋</span><span class=\"mei\">正忠</span><span class=\"age\">(75)</span></div></td>"
[71] "<td class=\"namae\"><div><span class=\"sei\">鴇田</span><span class=\"mei\">敦</span><span class=\"age\">(51)</span></div></td>"
[72] "<td class=\"namae\"><div><span class=\"sei\">松本</span><span class=\"mei\">洋平</span><span class=\"age\">(44)</span></div></td>"
[73] "<td class=\"namae\"><div><span class=\"sei\">末松</span><span class=\"mei\">義規</span><span class=\"age\">(60)</span></div></td>"
[74] "<td class=\"namae\"><div><span class=\"sei\">佐々木</span><span class=\"mei\">里加</span><span class=\"age\">(50)</span></div></td>"
[75] "<td class=\"namae\"><div><span class=\"sei\">杉下</span><span class=\"mei\">茂雄</span><span class=\"age\">(68)</span></div></td>"
[76] "<td class=\"namae\"><div><span class=\"sei\">木原</span><span class=\"mei\">誠二</span><span class=\"age\">(47)</span></div></td>"
[77] "<td class=\"namae\"><div><span class=\"sei\">宮本</span><span class=\"mei\">徹</span><span class=\"age\">(45)</span></div></td>"
[78] "<td class=\"namae\"><div><span class=\"sei\">鹿野</span><span class=\"mei\">晃</span><span class=\"age\">(44)</span></div></td>"
[79] "<td class=\"namae\"><div><span class=\"sei\">長島</span><span class=\"mei\">昭久</span><span class=\"age\">(55)</span></div></td>"
[80] "<td class=\"namae\"><div><span class=\"sei\">小田原</span><span class=\"mei\">潔</span><span class=\"age\">(53)</span></div></td>"
[81] "<td class=\"namae\"><div><span class=\"sei\">小糸</span><span class=\"mei\">健介</span><span class=\"age\">(35)</span></div></td>"
[82] "<td class=\"namae\"><div><span class=\"sei\">天木</span><span class=\"mei\">直人</span><span class=\"age\">(70)</span></div></td>"
[83] "<td class=\"namae\"><div><span class=\"sei\">伊藤</span><span class=\"mei\">達也</span><span class=\"age\">(56)</span></div></td>"
[84] "<td class=\"namae\"><div><span class=\"sei\">山花</span><span class=\"mei\">郁夫</span><span class=\"age\">(50)</span></div></td>"
[85] "<td class=\"namae\"><div><span class=\"sei\">金ケ崎</span><span class=\"mei\">絵美</span><span class=\"age\">(41)</span></div></td>"
[86] "<td class=\"namae\"><div><span class=\"sei\">阿部</span><span class=\"mei\">真</span><span class=\"age\">(43)</span></div></td>"
[87] "<td class=\"namae\"><div><span class=\"sei\">小倉</span><span class=\"mei\">将信</span><span class=\"age\">(36)</span></div></td>"
[88] "<td class=\"namae\"><div><span class=\"sei\">伊藤</span><span class=\"mei\">俊輔</span><span class=\"age\">(38)</span></div></td>"
[89] "<td class=\"namae\"><div><span class=\"sei\">松村</span><span class=\"mei\">亮佑</span><span class=\"age\">(37)</span></div></td>"
[90] "<td class=\"namae\"><div><span class=\"sei\">萩生田</span><span class=\"mei\">光一</span><span class=\"age\">(54)</span></div></td>"
[91] "<td class=\"namae\"><div><span class=\"sei\">高橋</span><span class=\"mei\">斉久</span><span class=\"age\">(44)</span></div></td>"
[92] "<td class=\"namae\"><div><span class=\"sei\">吉羽</span><span class=\"mei\">美華</span><span class=\"age\">(37)</span></div></td>"
[93] "<td class=\"namae\"><div><span class=\"sei\">飯田</span><span class=\"mei\">美弥子</span><span class=\"age\">(57)</span></div></td>"
[94] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">信治</span><span class=\"age\">(48)</span></div></td>"
[95] "<td class=\"namae\"><div><span class=\"sei\">山下</span><span class=\"mei\">容子</span><span class=\"age\">(58)</span></div></td>"
[96] "<td class=\"namae\"><div><span class=\"sei\">小沢</span><span class=\"mei\">鋭仁</span><span class=\"age\">(63)</span></div></td>"
[97] "<td class=\"namae\"><div><span class=\"sei\">井上</span><span class=\"mei\">宣</span><span class=\"age\">(43)</span></div></td>"
・gsub
関数を使って、数字以外の情報を消し、tokyo_ageと名前をつける。
tokyo_age <- gsub("<td class=\"namae\"><div><span class=\"sei\">(.*)</span><span class=\"mei\">(.*)</span><span class=\"age\">(.*)</span></div></td>","\\1, \\2, \\3, \\4", tokyo[age])
tokyo_age
[1] "海江田, 万里, (68), " "山田, 美樹, (43), " "松沢, 香, (39), "
[4] "原口, 実季, (28), " "犬丸, 光加, (57), " "又吉, 光雄, (73), "
[7] "辻, 清人, (38), " "松尾, 明弘, (42), " "鳩山, 太郎, (43), "
[10] "石原, 宏高, (53), " "松原, 仁, (61), " "香西, 克介, (41), "
[13] "平, 将明, (50), " "井戸, 正枝, (51), " "難波, 美智代, (43), "
[16] "青山, 昂平, (26), " "若宮, 健嗣, (56), " "手塚, 仁雄, (51), "
[19] "福田, 峰之, (53), " "落合, 貴之, (38), " "越智, 隆雄, (53), "
[22] "植松, 恵美子, (49), " "中岡, 茉妃, (26), " "長妻, 昭, (57), "
[25] "松本, 文明, (68), " "荒木, 章博, (64), " "井上, 郁磨, (26), "
[28] "石原, 伸晃, (60), " "吉田, 晴美, (45), " "木内, 孝胤, (51), "
[31] "長内, 史子, (29), " "円, より子, (70), " "斎藤, 郁真, (29), "
[34] "菅原, 一秀, (55), " "高松, 智之, (43), " "原, 純子, (53), "
[37] "前田, 吉成, (62), " "鈴木, 隼人, (40), " "鈴木, 庸介, (41), "
[40] "若狭, 勝, (60), " "岸, 良信, (62), " "小山, 徹, (42), "
[43] "吉井, 利光, (35), " "下村, 博文, (63), " "前田, 順一郎, (42), "
[46] "宍戸, 千絵, (39), " "小堤, 東, (28), " "太田, 昭宏, (72), "
[49] "池内, 沙織, (35), " "中村, 勝, (66), " "鴨下, 一郎, (68), "
[52] "北條, 智彦, (34), " "祖父江, 元希, (42), " "松島, みどり, (61), "
[55] "矢作, 麻子, (39), " "阿藤, 和之, (46), " "清井, 美穂, (54), "
[58] "大塚, 紀久雄, (76), " "秋元, 司, (46), " "柿沢, 未途, (46), "
[61] "吉田, 年男, (69), " "猪野, 隆, (52), " "大西, 英男, (71), "
[64] "初鹿, 明博, (48), " "田村, 謙治, (49), " "平沢, 勝栄, (72), "
[67] "西田, 主税, (55), " "新井, 杉生, (58), " "菅, 直人, (71), "
[70] "土屋, 正忠, (75), " "鴇田, 敦, (51), " "松本, 洋平, (44), "
[73] "末松, 義規, (60), " "佐々木, 里加, (50), " "杉下, 茂雄, (68), "
[76] "木原, 誠二, (47), " "宮本, 徹, (45), " "鹿野, 晃, (44), "
[79] "長島, 昭久, (55), " "小田原, 潔, (53), " "小糸, 健介, (35), "
[82] "天木, 直人, (70), " "伊藤, 達也, (56), " "山花, 郁夫, (50), "
[85] "金ケ崎, 絵美, (41), " "阿部, 真, (43), " "小倉, 将信, (36), "
[88] "伊藤, 俊輔, (38), " "松村, 亮佑, (37), " "萩生田, 光一, (54), "
[91] "高橋, 斉久, (44), " "吉羽, 美華, (37), " "飯田, 美弥子, (57), "
[94] "井上, 信治, (48), " "山下, 容子, (58), " "小沢, 鋭仁, (63), "
[97] "井上, 宣, (43), "
・age 以外の情報を消す。
・gsub
関数を使って、カッコを置換して消す。
tokyo_age <- gsub("[()]","",tokyo_age)
tokyo_age
[1] "海江田, 万里, 68, " "山田, 美樹, 43, " "松沢, 香, 39, "
[4] "原口, 実季, 28, " "犬丸, 光加, 57, " "又吉, 光雄, 73, "
[7] "辻, 清人, 38, " "松尾, 明弘, 42, " "鳩山, 太郎, 43, "
[10] "石原, 宏高, 53, " "松原, 仁, 61, " "香西, 克介, 41, "
[13] "平, 将明, 50, " "井戸, 正枝, 51, " "難波, 美智代, 43, "
[16] "青山, 昂平, 26, " "若宮, 健嗣, 56, " "手塚, 仁雄, 51, "
[19] "福田, 峰之, 53, " "落合, 貴之, 38, " "越智, 隆雄, 53, "
[22] "植松, 恵美子, 49, " "中岡, 茉妃, 26, " "長妻, 昭, 57, "
[25] "松本, 文明, 68, " "荒木, 章博, 64, " "井上, 郁磨, 26, "
[28] "石原, 伸晃, 60, " "吉田, 晴美, 45, " "木内, 孝胤, 51, "
[31] "長内, 史子, 29, " "円, より子, 70, " "斎藤, 郁真, 29, "
[34] "菅原, 一秀, 55, " "高松, 智之, 43, " "原, 純子, 53, "
[37] "前田, 吉成, 62, " "鈴木, 隼人, 40, " "鈴木, 庸介, 41, "
[40] "若狭, 勝, 60, " "岸, 良信, 62, " "小山, 徹, 42, "
[43] "吉井, 利光, 35, " "下村, 博文, 63, " "前田, 順一郎, 42, "
[46] "宍戸, 千絵, 39, " "小堤, 東, 28, " "太田, 昭宏, 72, "
[49] "池内, 沙織, 35, " "中村, 勝, 66, " "鴨下, 一郎, 68, "
[52] "北條, 智彦, 34, " "祖父江, 元希, 42, " "松島, みどり, 61, "
[55] "矢作, 麻子, 39, " "阿藤, 和之, 46, " "清井, 美穂, 54, "
[58] "大塚, 紀久雄, 76, " "秋元, 司, 46, " "柿沢, 未途, 46, "
[61] "吉田, 年男, 69, " "猪野, 隆, 52, " "大西, 英男, 71, "
[64] "初鹿, 明博, 48, " "田村, 謙治, 49, " "平沢, 勝栄, 72, "
[67] "西田, 主税, 55, " "新井, 杉生, 58, " "菅, 直人, 71, "
[70] "土屋, 正忠, 75, " "鴇田, 敦, 51, " "松本, 洋平, 44, "
[73] "末松, 義規, 60, " "佐々木, 里加, 50, " "杉下, 茂雄, 68, "
[76] "木原, 誠二, 47, " "宮本, 徹, 45, " "鹿野, 晃, 44, "
[79] "長島, 昭久, 55, " "小田原, 潔, 53, " "小糸, 健介, 35, "
[82] "天木, 直人, 70, " "伊藤, 達也, 56, " "山花, 郁夫, 50, "
[85] "金ケ崎, 絵美, 41, " "阿部, 真, 43, " "小倉, 将信, 36, "
[88] "伊藤, 俊輔, 38, " "松村, 亮佑, 37, " "萩生田, 光一, 54, "
[91] "高橋, 斉久, 44, " "吉羽, 美華, 37, " "飯田, 美弥子, 57, "
[94] "井上, 信治, 48, " "山下, 容子, 58, " "小沢, 鋭仁, 63, "
[97] "井上, 宣, 43, "
・gsub
関数を使って、カンマを置換して消す。
tokyo_age <- gsub(",","",tokyo_age)
tokyo_age
[1] "海江田 万里 68 " "山田 美樹 43 " "松沢 香 39 "
[4] "原口 実季 28 " "犬丸 光加 57 " "又吉 光雄 73 "
[7] "辻 清人 38 " "松尾 明弘 42 " "鳩山 太郎 43 "
[10] "石原 宏高 53 " "松原 仁 61 " "香西 克介 41 "
[13] "平 将明 50 " "井戸 正枝 51 " "難波 美智代 43 "
[16] "青山 昂平 26 " "若宮 健嗣 56 " "手塚 仁雄 51 "
[19] "福田 峰之 53 " "落合 貴之 38 " "越智 隆雄 53 "
[22] "植松 恵美子 49 " "中岡 茉妃 26 " "長妻 昭 57 "
[25] "松本 文明 68 " "荒木 章博 64 " "井上 郁磨 26 "
[28] "石原 伸晃 60 " "吉田 晴美 45 " "木内 孝胤 51 "
[31] "長内 史子 29 " "円 より子 70 " "斎藤 郁真 29 "
[34] "菅原 一秀 55 " "高松 智之 43 " "原 純子 53 "
[37] "前田 吉成 62 " "鈴木 隼人 40 " "鈴木 庸介 41 "
[40] "若狭 勝 60 " "岸 良信 62 " "小山 徹 42 "
[43] "吉井 利光 35 " "下村 博文 63 " "前田 順一郎 42 "
[46] "宍戸 千絵 39 " "小堤 東 28 " "太田 昭宏 72 "
[49] "池内 沙織 35 " "中村 勝 66 " "鴨下 一郎 68 "
[52] "北條 智彦 34 " "祖父江 元希 42 " "松島 みどり 61 "
[55] "矢作 麻子 39 " "阿藤 和之 46 " "清井 美穂 54 "
[58] "大塚 紀久雄 76 " "秋元 司 46 " "柿沢 未途 46 "
[61] "吉田 年男 69 " "猪野 隆 52 " "大西 英男 71 "
[64] "初鹿 明博 48 " "田村 謙治 49 " "平沢 勝栄 72 "
[67] "西田 主税 55 " "新井 杉生 58 " "菅 直人 71 "
[70] "土屋 正忠 75 " "鴇田 敦 51 " "松本 洋平 44 "
[73] "末松 義規 60 " "佐々木 里加 50 " "杉下 茂雄 68 "
[76] "木原 誠二 47 " "宮本 徹 45 " "鹿野 晃 44 "
[79] "長島 昭久 55 " "小田原 潔 53 " "小糸 健介 35 "
[82] "天木 直人 70 " "伊藤 達也 56 " "山花 郁夫 50 "
[85] "金ケ崎 絵美 41 " "阿部 真 43 " "小倉 将信 36 "
[88] "伊藤 俊輔 38 " "松村 亮佑 37 " "萩生田 光一 54 "
[91] "高橋 斉久 44 " "吉羽 美華 37 " "飯田 美弥子 57 "
[94] "井上 信治 48 " "山下 容子 58 " "小沢 鋭仁 63 "
[97] "井上 宣 43 "
・名前を消して、数字だけを残したい。
・名前にひらがなが入っている候補者がいる。
・ひらがなの文字を消す。
tokyo_age <- gsub("[あ-ん]","",tokyo_age)
tokyo_age
[1] "海江田 万里 68 " "山田 美樹 43 " "松沢 香 39 "
[4] "原口 実季 28 " "犬丸 光加 57 " "又吉 光雄 73 "
[7] "辻 清人 38 " "松尾 明弘 42 " "鳩山 太郎 43 "
[10] "石原 宏高 53 " "松原 仁 61 " "香西 克介 41 "
[13] "平 将明 50 " "井戸 正枝 51 " "難波 美智代 43 "
[16] "青山 昂平 26 " "若宮 健嗣 56 " "手塚 仁雄 51 "
[19] "福田 峰之 53 " "落合 貴之 38 " "越智 隆雄 53 "
[22] "植松 恵美子 49 " "中岡 茉妃 26 " "長妻 昭 57 "
[25] "松本 文明 68 " "荒木 章博 64 " "井上 郁磨 26 "
[28] "石原 伸晃 60 " "吉田 晴美 45 " "木内 孝胤 51 "
[31] "長内 史子 29 " "円 子 70 " "斎藤 郁真 29 "
[34] "菅原 一秀 55 " "高松 智之 43 " "原 純子 53 "
[37] "前田 吉成 62 " "鈴木 隼人 40 " "鈴木 庸介 41 "
[40] "若狭 勝 60 " "岸 良信 62 " "小山 徹 42 "
[43] "吉井 利光 35 " "下村 博文 63 " "前田 順一郎 42 "
[46] "宍戸 千絵 39 " "小堤 東 28 " "太田 昭宏 72 "
[49] "池内 沙織 35 " "中村 勝 66 " "鴨下 一郎 68 "
[52] "北條 智彦 34 " "祖父江 元希 42 " "松島 61 "
[55] "矢作 麻子 39 " "阿藤 和之 46 " "清井 美穂 54 "
[58] "大塚 紀久雄 76 " "秋元 司 46 " "柿沢 未途 46 "
[61] "吉田 年男 69 " "猪野 隆 52 " "大西 英男 71 "
[64] "初鹿 明博 48 " "田村 謙治 49 " "平沢 勝栄 72 "
[67] "西田 主税 55 " "新井 杉生 58 " "菅 直人 71 "
[70] "土屋 正忠 75 " "鴇田 敦 51 " "松本 洋平 44 "
[73] "末松 義規 60 " "佐々木 里加 50 " "杉下 茂雄 68 "
[76] "木原 誠二 47 " "宮本 徹 45 " "鹿野 晃 44 "
[79] "長島 昭久 55 " "小田原 潔 53 " "小糸 健介 35 "
[82] "天木 直人 70 " "伊藤 達也 56 " "山花 郁夫 50 "
[85] "金ケ崎 絵美 41 " "阿部 真 43 " "小倉 将信 36 "
[88] "伊藤 俊輔 38 " "松村 亮佑 37 " "萩生田 光一 54 "
[91] "高橋 斉久 44 " "吉羽 美華 37 " "飯田 美弥子 57 "
[94] "井上 信治 48 " "山下 容子 58 " "小沢 鋭仁 63 "
[97] "井上 宣 43 "
・gsub
関数を使って、姓と名の間の半角スペースを消す。
tokyo_age <- gsub(" ","",tokyo_age)
tokyo_age
[1] "海江田万里68" "山田美樹43" "松沢香39" "原口実季28"
[5] "犬丸光加57" "又吉光雄73" "辻清人38" "松尾明弘42"
[9] "鳩山太郎43" "石原宏高53" "松原仁61" "香西克介41"
[13] "平将明50" "井戸正枝51" "難波美智代43" "青山昂平26"
[17] "若宮健嗣56" "手塚仁雄51" "福田峰之53" "落合貴之38"
[21] "越智隆雄53" "植松恵美子49" "中岡茉妃26" "長妻昭57"
[25] "松本文明68" "荒木章博64" "井上郁磨26" "石原伸晃60"
[29] "吉田晴美45" "木内孝胤51" "長内史子29" "円子70"
[33] "斎藤郁真29" "菅原一秀55" "高松智之43" "原純子53"
[37] "前田吉成62" "鈴木隼人40" "鈴木庸介41" "若狭勝60"
[41] "岸良信62" "小山徹42" "吉井利光35" "下村博文63"
[45] "前田順一郎42" "宍戸千絵39" "小堤東28" "太田昭宏72"
[49] "池内沙織35" "中村勝66" "鴨下一郎68" "北條智彦34"
[53] "祖父江元希42" "松島61" "矢作麻子39" "阿藤和之46"
[57] "清井美穂54" "大塚紀久雄76" "秋元司46" "柿沢未途46"
[61] "吉田年男69" "猪野隆52" "大西英男71" "初鹿明博48"
[65] "田村謙治49" "平沢勝栄72" "西田主税55" "新井杉生58"
[69] "菅直人71" "土屋正忠75" "鴇田敦51" "松本洋平44"
[73] "末松義規60" "佐々木里加50" "杉下茂雄68" "木原誠二47"
[77] "宮本徹45" "鹿野晃44" "長島昭久55" "小田原潔53"
[81] "小糸健介35" "天木直人70" "伊藤達也56" "山花郁夫50"
[85] "金ケ崎絵美41" "阿部真43" "小倉将信36" "伊藤俊輔38"
[89] "松村亮佑37" "萩生田光一54" "高橋斉久44" "吉羽美華37"
[93] "飯田美弥子57" "井上信治48" "山下容子58" "小沢鋭仁63"
[97] "井上宣43"
・氏名に漢数字が入っている候補者がいる。
・gsub
関数を使って、漢数字を消す。
tokyo_age <- gsub("[一-十]", "", tokyo_age)
tokyo_age
[1] "海江田里68" "山田美樹43" "松沢香39" "原口実季28"
[5] "犬57" "又吉雄73" "辻清38" "松尾明弘42"
[9] "鳩山太郎43" "石原宏高53" "松原61" "香西41"
[13] "平将明50" "戸正枝51" "難波美智43" "青山昂平26"
[17] "若宮嗣56" "手塚雄51" "福田峰53" "落合貴38"
[21] "越智隆雄53" "植松恵美子49" "岡茉妃26" "長妻昭57"
[25] "松本文明68" "荒木章博64" "郁磨26" "石原晃60"
[29] "吉田晴美45" "木孝胤51" "長史子29" "子70"
[33] "斎藤郁真29" "菅原秀55" "高松智43" "原純子53"
[37] "田吉成62" "鈴木隼40" "鈴木庸41" "若狭60"
[41] "岸良62" "小山徹42" "吉35" "村博文63"
[45] "田順郎42" "宍戸千絵39" "小堤東28" "太田昭宏72"
[49] "池沙織35" "村66" "鴨郎68" "條智彦34"
[53] "祖父江希42" "松島61" "矢麻子39" "阿藤和46"
[57] "清美穂54" "大塚紀雄76" "秋司46" "柿沢未途46"
[61] "吉田年男69" "猪野隆52" "大西英男71" "鹿明博48"
[65] "田村謙治49" "平沢栄72" "西田税55" "新杉生58"
[69] "菅直71" "土屋正忠75" "鴇田敦51" "松本洋平44"
[73] "末松義規60" "々木里50" "杉茂雄68" "木原誠47"
[77] "宮本徹45" "鹿野晃44" "長島昭55" "小田原潔53"
[81] "小糸35" "天木直70" "藤達56" "山花郁夫50"
[85] "金ケ崎絵美41" "阿部真43" "小将36" "藤輔38"
[89] "松村37" "萩生田54" "高橋斉44" "吉羽美華37"
[93] "飯田美弥子57" "治48" "山容子58" "小沢鋭63"
[97] "宣43"
・gsub
関数を使って、氏名の漢字を消す。
tokyo_age <- gsub("[亜-黑]", "", tokyo_age)
tokyo_age
[1] "68" "43" "39" "28" "57" "73" "38" "42" "43" "53"
[11] "61" "41" "50" "51" "43" "26" "56" "51" "53" "38"
[21] "53" "49" "26" "57" "68" "64" "26" "60" "45" "51"
[31] "29" "70" "29" "55" "43" "53" "62" "40" "41" "60"
[41] "62" "42" "35" "63" "42" "39" "28" "72" "35" "66"
[51] "68" "34" "42" "61" "39" "46" "54" "76" "46" "46"
[61] "69" "52" "71" "48" "49" "72" "55" "58" "71" "75"
[71] "51" "44" "60" "々50" "68" "47" "45" "44" "55" "53"
[81] "35" "70" "56" "50" "ケ41" "43" "36" "38" "37" "54"
[91] "44" "37" "57" "48" "58" "63" "43"
・文字が残ってしまった候補者が 2 人だけいる。
・その候補者の文字「ケ」と「々」を消す。
・gsub
関数を使って、「ケ」を消す。
tokyo_age <- gsub("ケ", "", tokyo_age)
tokyo_age
[1] "68" "43" "39" "28" "57" "73" "38" "42" "43" "53"
[11] "61" "41" "50" "51" "43" "26" "56" "51" "53" "38"
[21] "53" "49" "26" "57" "68" "64" "26" "60" "45" "51"
[31] "29" "70" "29" "55" "43" "53" "62" "40" "41" "60"
[41] "62" "42" "35" "63" "42" "39" "28" "72" "35" "66"
[51] "68" "34" "42" "61" "39" "46" "54" "76" "46" "46"
[61] "69" "52" "71" "48" "49" "72" "55" "58" "71" "75"
[71] "51" "44" "60" "々50" "68" "47" "45" "44" "55" "53"
[81] "35" "70" "56" "50" "41" "43" "36" "38" "37" "54"
[91] "44" "37" "57" "48" "58" "63" "43"
・gsub
関数を使って、「々」を消す。
tokyo_age <- gsub("々", "", tokyo_age)
tokyo_age
[1] "68" "43" "39" "28" "57" "73" "38" "42" "43" "53" "61" "41" "50" "51"
[15] "43" "26" "56" "51" "53" "38" "53" "49" "26" "57" "68" "64" "26" "60"
[29] "45" "51" "29" "70" "29" "55" "43" "53" "62" "40" "41" "60" "62" "42"
[43] "35" "63" "42" "39" "28" "72" "35" "66" "68" "34" "42" "61" "39" "46"
[57] "54" "76" "46" "46" "69" "52" "71" "48" "49" "72" "55" "58" "71" "75"
[71] "51" "44" "60" "50" "68" "47" "45" "44" "55" "53" "35" "70" "56" "50"
[85] "41" "43" "36" "38" "37" "54" "44" "37" "57" "48" "58" "63" "43"
・これで、年齢の情報のみを抜き出すことができた。
・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者の得票数は num の後ろにあることがわかる。
・最初にgrep()
関数を使って、tokyo から得票数のデータだけを選別し count と名前をつける。
count <- grep("<td class=\"num\"", tokyo)
tokyo[count]
[1] "<td class=\"num\"><div>96,255<span>40.69<span class=\"tani\">%</span></span></div></td>"
[2] "<td class=\"num\"><div>93,234<span>39.41<span class=\"tani\">%</span></span></div></td>"
[3] "<td class=\"num\"><div>40,376<span>17.07<span class=\"tani\">%</span></span></div></td>"
[4] "<td class=\"num\"><div>3,806<span>1.61<span class=\"tani\">%</span></span></div></td>"
[5] "<td class=\"num\"><div>1,570<span>0.66<span class=\"tani\">%</span></span></div></td>"
[6] "<td class=\"num\"><div>1,307<span>0.55<span class=\"tani\">%</span></span></div></td>"
[7] "<td class=\"num\"><div>112,993<span>45.90<span class=\"tani\">%</span></span></div></td>"
[8] "<td class=\"num\"><div>91,230<span>37.06<span class=\"tani\">%</span></span></div></td>"
[9] "<td class=\"num\"><div>41,955<span>17.04<span class=\"tani\">%</span></span></div></td>"
[10] "<td class=\"num\"><div>107,708<span>43.58<span class=\"tani\">%</span></span></div></td>"
[11] "<td class=\"num\"><div>94,380<span>38.18<span class=\"tani\">%</span></span></div></td>"
[12] "<td class=\"num\"><div>45,088<span>18.24<span class=\"tani\">%</span></span></div></td>"
[13] "<td class=\"num\"><div>115,239<span>50.08<span class=\"tani\">%</span></span></div></td>"
[14] "<td class=\"num\"><div>53,480<span>23.24<span class=\"tani\">%</span></span></div></td>"
[15] "<td class=\"num\"><div>35,352<span>15.36<span class=\"tani\">%</span></span></div></td>"
[16] "<td class=\"num\"><div>26,037<span>11.32<span class=\"tani\">%</span></span></div></td>"
[17] "<td class=\"num\"><div>101,314<span>41.15<span class=\"tani\">%</span></span></div></td>"
[18] "<td class=\"num\"><div>99,182<span>40.28<span class=\"tani\">%</span></span></div></td>"
[19] "<td class=\"num\"><div>45,737<span>18.57<span class=\"tani\">%</span></span></div></td>"
[20] "<td class=\"num\"><div>100,400<span>40.81<span class=\"tani\">%</span></span></div></td>"
[21] "<td class=\"num\"><div>98,422<span>40.01<span class=\"tani\">%</span></span></div></td>"
[22] "<td class=\"num\"><div>42,862<span>17.42<span class=\"tani\">%</span></span></div></td>"
[23] "<td class=\"num\"><div>4,307<span>1.75<span class=\"tani\">%</span></span></div></td>"
[24] "<td class=\"num\"><div>117,118<span>50.52<span class=\"tani\">%</span></span></div></td>"
[25] "<td class=\"num\"><div>85,305<span>36.80<span class=\"tani\">%</span></span></div></td>"
[26] "<td class=\"num\"><div>25,531<span>11.01<span class=\"tani\">%</span></span></div></td>"
[27] "<td class=\"num\"><div>3,850<span>1.66<span class=\"tani\">%</span></span></div></td>"
[28] "<td class=\"num\"><div>99,863<span>39.22<span class=\"tani\">%</span></span></div></td>"
[29] "<td class=\"num\"><div>76,283<span>29.96<span class=\"tani\">%</span></span></div></td>"
[30] "<td class=\"num\"><div>41,175<span>16.17<span class=\"tani\">%</span></span></div></td>"
[31] "<td class=\"num\"><div>22,399<span>8.80<span class=\"tani\">%</span></span></div></td>"
[32] "<td class=\"num\"><div>11,997<span>4.71<span class=\"tani\">%</span></span></div></td>"
[33] "<td class=\"num\"><div>2,931<span>1.15<span class=\"tani\">%</span></span></div></td>"
[34] "<td class=\"num\"><div>122,279<span>49.17<span class=\"tani\">%</span></span></div></td>"
[35] "<td class=\"num\"><div>64,731<span>26.03<span class=\"tani\">%</span></span></div></td>"
[36] "<td class=\"num\"><div>57,439<span>23.10<span class=\"tani\">%</span></span></div></td>"
[37] "<td class=\"num\"><div>4,243<span>1.71<span class=\"tani\">%</span></span></div></td>"
[38] "<td class=\"num\"><div>91,146<span>37.37<span class=\"tani\">%</span></span></div></td>"
[39] "<td class=\"num\"><div>70,168<span>28.77<span class=\"tani\">%</span></span></div></td>"
[40] "<td class=\"num\"><div>57,901<span>23.74<span class=\"tani\">%</span></span></div></td>"
[41] "<td class=\"num\"><div>20,828<span>8.54<span class=\"tani\">%</span></span></div></td>"
[42] "<td class=\"num\"><div>2,107<span>0.86<span class=\"tani\">%</span></span></div></td>"
[43] "<td class=\"num\"><div>1,744<span>0.72<span class=\"tani\">%</span></span></div></td>"
[44] "<td class=\"num\"><div>104,612<span>44.90<span class=\"tani\">%</span></span></div></td>"
[45] "<td class=\"num\"><div>60,291<span>25.88<span class=\"tani\">%</span></span></div></td>"
[46] "<td class=\"num\"><div>42,668<span>18.31<span class=\"tani\">%</span></span></div></td>"
[47] "<td class=\"num\"><div>25,426<span>10.91<span class=\"tani\">%</span></span></div></td>"
[48] "<td class=\"num\"><div>112,597<span>51.64<span class=\"tani\">%</span></span></div></td>"
[49] "<td class=\"num\"><div>83,544<span>38.32<span class=\"tani\">%</span></span></div></td>"
[50] "<td class=\"num\"><div>21,892<span>10.04<span class=\"tani\">%</span></span></div></td>"
[51] "<td class=\"num\"><div>120,744<span>55.23<span class=\"tani\">%</span></span></div></td>"
[52] "<td class=\"num\"><div>67,070<span>30.68<span class=\"tani\">%</span></span></div></td>"
[53] "<td class=\"num\"><div>30,807<span>14.09<span class=\"tani\">%</span></span></div></td>"
[54] "<td class=\"num\"><div>104,137<span>46.94<span class=\"tani\">%</span></span></div></td>"
[55] "<td class=\"num\"><div>63,235<span>28.50<span class=\"tani\">%</span></span></div></td>"
[56] "<td class=\"num\"><div>46,600<span>21.00<span class=\"tani\">%</span></span></div></td>"
[57] "<td class=\"num\"><div>4,282<span>1.93<span class=\"tani\">%</span></span></div></td>"
[58] "<td class=\"num\"><div>3,607<span>1.63<span class=\"tani\">%</span></span></div></td>"
[59] "<td class=\"num\"><div>101,155<span>45.55<span class=\"tani\">%</span></span></div></td>"
[60] "<td class=\"num\"><div>70,325<span>31.67<span class=\"tani\">%</span></span></div></td>"
[61] "<td class=\"num\"><div>34,943<span>15.73<span class=\"tani\">%</span></span></div></td>"
[62] "<td class=\"num\"><div>15,667<span>7.05<span class=\"tani\">%</span></span></div></td>"
[63] "<td class=\"num\"><div>84,457<span>40.91<span class=\"tani\">%</span></span></div></td>"
[64] "<td class=\"num\"><div>71,405<span>34.59<span class=\"tani\">%</span></span></div></td>"
[65] "<td class=\"num\"><div>50,568<span>24.50<span class=\"tani\">%</span></span></div></td>"
[66] "<td class=\"num\"><div>127,632<span>57.95<span class=\"tani\">%</span></span></div></td>"
[67] "<td class=\"num\"><div>49,485<span>22.47<span class=\"tani\">%</span></span></div></td>"
[68] "<td class=\"num\"><div>43,138<span>19.59<span class=\"tani\">%</span></span></div></td>"
[69] "<td class=\"num\"><div>96,713<span>40.73<span class=\"tani\">%</span></span></div></td>"
[70] "<td class=\"num\"><div>95,667<span>40.29<span class=\"tani\">%</span></span></div></td>"
[71] "<td class=\"num\"><div>45,081<span>18.98<span class=\"tani\">%</span></span></div></td>"
[72] "<td class=\"num\"><div>96,229<span>41.14<span class=\"tani\">%</span></span></div></td>"
[73] "<td class=\"num\"><div>90,540<span>38.71<span class=\"tani\">%</span></span></div></td>"
[74] "<td class=\"num\"><div>29,743<span>12.72<span class=\"tani\">%</span></span></div></td>"
[75] "<td class=\"num\"><div>17,377<span>7.43<span class=\"tani\">%</span></span></div></td>"
[76] "<td class=\"num\"><div>107,686<span>49.89<span class=\"tani\">%</span></span></div></td>"
[77] "<td class=\"num\"><div>57,741<span>26.75<span class=\"tani\">%</span></span></div></td>"
[78] "<td class=\"num\"><div>50,439<span>23.37<span class=\"tani\">%</span></span></div></td>"
[79] "<td class=\"num\"><div>92,356<span>40.97<span class=\"tani\">%</span></span></div></td>"
[80] "<td class=\"num\"><div>88,225<span>39.14<span class=\"tani\">%</span></span></div></td>"
[81] "<td class=\"num\"><div>38,195<span>16.94<span class=\"tani\">%</span></span></div></td>"
[82] "<td class=\"num\"><div>6,655<span>2.95<span class=\"tani\">%</span></span></div></td>"
[83] "<td class=\"num\"><div>110,493<span>43.39<span class=\"tani\">%</span></span></div></td>"
[84] "<td class=\"num\"><div>91,073<span>35.76<span class=\"tani\">%</span></span></div></td>"
[85] "<td class=\"num\"><div>30,236<span>11.87<span class=\"tani\">%</span></span></div></td>"
[86] "<td class=\"num\"><div>22,859<span>8.98<span class=\"tani\">%</span></span></div></td>"
[87] "<td class=\"num\"><div>110,522<span>44.95<span class=\"tani\">%</span></span></div></td>"
[88] "<td class=\"num\"><div>76,450<span>31.09<span class=\"tani\">%</span></span></div></td>"
[89] "<td class=\"num\"><div>58,929<span>23.96<span class=\"tani\">%</span></span></div></td>"
[90] "<td class=\"num\"><div>122,331<span>49.32<span class=\"tani\">%</span></span></div></td>"
[91] "<td class=\"num\"><div>61,441<span>24.77<span class=\"tani\">%</span></span></div></td>"
[92] "<td class=\"num\"><div>39,892<span>16.08<span class=\"tani\">%</span></span></div></td>"
[93] "<td class=\"num\"><div>24,349<span>9.82<span class=\"tani\">%</span></span></div></td>"
[94] "<td class=\"num\"><div>112,014<span>51.81<span class=\"tani\">%</span></span></div></td>"
[95] "<td class=\"num\"><div>44,884<span>20.76<span class=\"tani\">%</span></span></div></td>"
[96] "<td class=\"num\"><div>38,286<span>17.71<span class=\"tani\">%</span></span></div></td>"
[97] "<td class=\"num\"><div>21,031<span>9.73<span class=\"tani\">%</span></span></div></td>"
・gsub
関数を使って、数字以外の情報を消し、tokyo_count と名前をつける。
tokyo_count <- gsub("<td class=\"num\"><div>(.*)<span>[0-99.99]+(.*)<span class=\"tani\">%</span></span></div></td>","\\1:\\2", tokyo[count])
tokyo_count
[1] "96,255:" "93,234:" "40,376:" "3,806:" "1,570:" "1,307:"
[7] "112,993:" "91,230:" "41,955:" "107,708:" "94,380:" "45,088:"
[13] "115,239:" "53,480:" "35,352:" "26,037:" "101,314:" "99,182:"
[19] "45,737:" "100,400:" "98,422:" "42,862:" "4,307:" "117,118:"
[25] "85,305:" "25,531:" "3,850:" "99,863:" "76,283:" "41,175:"
[31] "22,399:" "11,997:" "2,931:" "122,279:" "64,731:" "57,439:"
[37] "4,243:" "91,146:" "70,168:" "57,901:" "20,828:" "2,107:"
[43] "1,744:" "104,612:" "60,291:" "42,668:" "25,426:" "112,597:"
[49] "83,544:" "21,892:" "120,744:" "67,070:" "30,807:" "104,137:"
[55] "63,235:" "46,600:" "4,282:" "3,607:" "101,155:" "70,325:"
[61] "34,943:" "15,667:" "84,457:" "71,405:" "50,568:" "127,632:"
[67] "49,485:" "43,138:" "96,713:" "95,667:" "45,081:" "96,229:"
[73] "90,540:" "29,743:" "17,377:" "107,686:" "57,741:" "50,439:"
[79] "92,356:" "88,225:" "38,195:" "6,655:" "110,493:" "91,073:"
[85] "30,236:" "22,859:" "110,522:" "76,450:" "58,929:" "122,331:"
[91] "61,441:" "39,892:" "24,349:" "112,014:" "44,884:" "38,286:"
[97] "21,031:"
・gsub
関数を使って、 " : " を置換して消す。
tokyo_count <- gsub(":", "", tokyo_count)
tokyo_count
[1] "96,255" "93,234" "40,376" "3,806" "1,570" "1,307" "112,993"
[8] "91,230" "41,955" "107,708" "94,380" "45,088" "115,239" "53,480"
[15] "35,352" "26,037" "101,314" "99,182" "45,737" "100,400" "98,422"
[22] "42,862" "4,307" "117,118" "85,305" "25,531" "3,850" "99,863"
[29] "76,283" "41,175" "22,399" "11,997" "2,931" "122,279" "64,731"
[36] "57,439" "4,243" "91,146" "70,168" "57,901" "20,828" "2,107"
[43] "1,744" "104,612" "60,291" "42,668" "25,426" "112,597" "83,544"
[50] "21,892" "120,744" "67,070" "30,807" "104,137" "63,235" "46,600"
[57] "4,282" "3,607" "101,155" "70,325" "34,943" "15,667" "84,457"
[64] "71,405" "50,568" "127,632" "49,485" "43,138" "96,713" "95,667"
[71] "45,081" "96,229" "90,540" "29,743" "17,377" "107,686" "57,741"
[78] "50,439" "92,356" "88,225" "38,195" "6,655" "110,493" "91,073"
[85] "30,236" "22,859" "110,522" "76,450" "58,929" "122,331" "61,441"
[92] "39,892" "24,349" "112,014" "44,884" "38,286" "21,031"
・gsub
関数を使って、カンマを置換して消す。
tokyo_count <- gsub(",", "", tokyo_count)
tokyo_count
[1] "96255" "93234" "40376" "3806" "1570" "1307" "112993"
[8] "91230" "41955" "107708" "94380" "45088" "115239" "53480"
[15] "35352" "26037" "101314" "99182" "45737" "100400" "98422"
[22] "42862" "4307" "117118" "85305" "25531" "3850" "99863"
[29] "76283" "41175" "22399" "11997" "2931" "122279" "64731"
[36] "57439" "4243" "91146" "70168" "57901" "20828" "2107"
[43] "1744" "104612" "60291" "42668" "25426" "112597" "83544"
[50] "21892" "120744" "67070" "30807" "104137" "63235" "46600"
[57] "4282" "3607" "101155" "70325" "34943" "15667" "84457"
[64] "71405" "50568" "127632" "49485" "43138" "96713" "95667"
[71] "45081" "96229" "90540" "29743" "17377" "107686" "57741"
[78] "50439" "92356" "88225" "38195" "6655" "110493" "91073"
[85] "30236" "22859" "110522" "76450" "58929" "122331" "61441"
[92] "39892" "24349" "112014" "44884" "38286" "21031"
・これで、利候補者の年齢情報のみを抜き出すことができた。
・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者の所属政党は party の後ろにあることがわかる。
・grep()
関数を使って、tokyo から候補者の所属政党データだけを選別し party と名前をつける。
party <- grep("<td class=\"party\"", tokyo)
tokyo[party]
[1] "<td class=\"party\"><div>立憲</div></td>"
[2] "<td class=\"party\"><div>自民</div></td>"
[3] "<td class=\"party\"><div>希望</div></td>"
[4] "<td class=\"party\"><div>諸派</div></td>"
[5] "<td class=\"party\"><div>諸派</div></td>"
[6] "<td class=\"party\"><div>諸派</div></td>"
[7] "<td class=\"party\"><div>自民</div></td>"
[8] "<td class=\"party\"><div>立憲</div></td>"
[9] "<td class=\"party\"><div>希望</div></td>"
[10] "<td class=\"party\"><div>自民</div></td>"
[11] "<td class=\"party\"><div>希望</div></td>"
[12] "<td class=\"party\"><div>共産</div></td>"
[13] "<td class=\"party\"><div>自民</div></td>"
[14] "<td class=\"party\"><div>立憲</div></td>"
[15] "<td class=\"party\"><div>希望</div></td>"
[16] "<td class=\"party\"><div>共産</div></td>"
[17] "<td class=\"party\"><div>自民</div></td>"
[18] "<td class=\"party\"><div>立憲</div></td>"
[19] "<td class=\"party\"><div>希望</div></td>"
[20] "<td class=\"party\"><div>立憲</div></td>"
[21] "<td class=\"party\"><div>自民</div></td>"
[22] "<td class=\"party\"><div>希望</div></td>"
[23] "<td class=\"party\"><div>諸派</div></td>"
[24] "<td class=\"party\"><div>立憲</div></td>"
[25] "<td class=\"party\"><div>自民</div></td>"
[26] "<td class=\"party\"><div>希望</div></td>"
[27] "<td class=\"party\"><div>無所</div></td>"
[28] "<td class=\"party\"><div>自民</div></td>"
[29] "<td class=\"party\"><div>立憲</div></td>"
[30] "<td class=\"party\"><div>希望</div></td>"
[31] "<td class=\"party\"><div>共産</div></td>"
[32] "<td class=\"party\"><div>無所</div></td>"
[33] "<td class=\"party\"><div>諸派</div></td>"
[34] "<td class=\"party\"><div>自民</div></td>"
[35] "<td class=\"party\"><div>希望</div></td>"
[36] "<td class=\"party\"><div>共産</div></td>"
[37] "<td class=\"party\"><div>無所</div></td>"
[38] "<td class=\"party\"><div>自民</div></td>"
[39] "<td class=\"party\"><div>立憲</div></td>"
[40] "<td class=\"party\"><div>希望</div></td>"
[41] "<td class=\"party\"><div>共産</div></td>"
[42] "<td class=\"party\"><div>無所</div></td>"
[43] "<td class=\"party\"><div>諸派</div></td>"
[44] "<td class=\"party\"><div>自民</div></td>"
[45] "<td class=\"party\"><div>立憲</div></td>"
[46] "<td class=\"party\"><div>希望</div></td>"
[47] "<td class=\"party\"><div>共産</div></td>"
[48] "<td class=\"party\"><div>公明</div></td>"
[49] "<td class=\"party\"><div>共産</div></td>"
[50] "<td class=\"party\"><div>諸派</div></td>"
[51] "<td class=\"party\"><div>自民</div></td>"
[52] "<td class=\"party\"><div>立憲</div></td>"
[53] "<td class=\"party\"><div>共産</div></td>"
[54] "<td class=\"party\"><div>自民</div></td>"
[55] "<td class=\"party\"><div>希望</div></td>"
[56] "<td class=\"party\"><div>共産</div></td>"
[57] "<td class=\"party\"><div>諸派</div></td>"
[58] "<td class=\"party\"><div>無所</div></td>"
[59] "<td class=\"party\"><div>自民</div></td>"
[60] "<td class=\"party\"><div>希望</div></td>"
[61] "<td class=\"party\"><div>共産</div></td>"
[62] "<td class=\"party\"><div>無所</div></td>"
[63] "<td class=\"party\"><div>自民</div></td>"
[64] "<td class=\"party\"><div>立憲</div></td>"
[65] "<td class=\"party\"><div>希望</div></td>"
[66] "<td class=\"party\"><div>自民</div></td>"
[67] "<td class=\"party\"><div>希望</div></td>"
[68] "<td class=\"party\"><div>共産</div></td>"
[69] "<td class=\"party\"><div>立憲</div></td>"
[70] "<td class=\"party\"><div>自民</div></td>"
[71] "<td class=\"party\"><div>希望</div></td>"
[72] "<td class=\"party\"><div>自民</div></td>"
[73] "<td class=\"party\"><div>立憲</div></td>"
[74] "<td class=\"party\"><div>希望</div></td>"
[75] "<td class=\"party\"><div>共産</div></td>"
[76] "<td class=\"party\"><div>自民</div></td>"
[77] "<td class=\"party\"><div>共産</div></td>"
[78] "<td class=\"party\"><div>希望</div></td>"
[79] "<td class=\"party\"><div>希望</div></td>"
[80] "<td class=\"party\"><div>自民</div></td>"
[81] "<td class=\"party\"><div>社民</div></td>"
[82] "<td class=\"party\"><div>諸派</div></td>"
[83] "<td class=\"party\"><div>自民</div></td>"
[84] "<td class=\"party\"><div>立憲</div></td>"
[85] "<td class=\"party\"><div>希望</div></td>"
[86] "<td class=\"party\"><div>共産</div></td>"
[87] "<td class=\"party\"><div>自民</div></td>"
[88] "<td class=\"party\"><div>希望</div></td>"
[89] "<td class=\"party\"><div>共産</div></td>"
[90] "<td class=\"party\"><div>自民</div></td>"
[91] "<td class=\"party\"><div>立憲</div></td>"
[92] "<td class=\"party\"><div>希望</div></td>"
[93] "<td class=\"party\"><div>共産</div></td>"
[94] "<td class=\"party\"><div>自民</div></td>"
[95] "<td class=\"party\"><div>立憲</div></td>"
[96] "<td class=\"party\"><div>希望</div></td>"
[97] "<td class=\"party\"><div>共産</div></td>"
・gsub
関数を使って、所属政党以外の情報を消し、tokyo_party と名前をつける。
tokyo_party <- gsub("<td class=\"party\"><div>(.*)</div></td>","\\1", tokyo[party])
tokyo_party
[1] "立憲" "自民" "希望" "諸派" "諸派" "諸派" "自民" "立憲" "希望" "自民"
[11] "希望" "共産" "自民" "立憲" "希望" "共産" "自民" "立憲" "希望" "立憲"
[21] "自民" "希望" "諸派" "立憲" "自民" "希望" "無所" "自民" "立憲" "希望"
[31] "共産" "無所" "諸派" "自民" "希望" "共産" "無所" "自民" "立憲" "希望"
[41] "共産" "無所" "諸派" "自民" "立憲" "希望" "共産" "公明" "共産" "諸派"
[51] "自民" "立憲" "共産" "自民" "希望" "共産" "諸派" "無所" "自民" "希望"
[61] "共産" "無所" "自民" "立憲" "希望" "自民" "希望" "共産" "立憲" "自民"
[71] "希望" "自民" "立憲" "希望" "共産" "自民" "共産" "希望" "希望" "自民"
[81] "社民" "諸派" "自民" "立憲" "希望" "共産" "自民" "希望" "共産" "自民"
[91] "立憲" "希望" "共産" "自民" "立憲" "希望" "共産"
・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者の当選回数は tosenkaisu の後ろにあることがわかる。
・grep()
関数を使って、tokyo から候補者のデータだけを選別し previous と名前をつける。
previous <- grep("<td class=\"tosenkaisu\"", tokyo)
tokyo[previous]
[1] "<td class=\"tosenkaisu\"><div>7<span>回</span></div></td>"
[2] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[3] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[4] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[5] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[6] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[7] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[8] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[9] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[10] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[11] "<td class=\"tosenkaisu\"><div>7<span>回</span></div></td>"
[12] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[13] "<td class=\"tosenkaisu\"><div>5<span>回</span></div></td>"
[14] "<td class=\"tosenkaisu\"><div>1<span>回</span></div></td>"
[15] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[16] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[17] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[18] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[19] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[20] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>"
[21] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[22] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[23] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[24] "<td class=\"tosenkaisu\"><div>7<span>回</span></div></td>"
[25] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[26] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[27] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[28] "<td class=\"tosenkaisu\"><div>10<span>回</span></div></td>"
[29] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[30] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>"
[31] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[32] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[33] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[34] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>"
[35] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[36] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[37] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[38] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>"
[39] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[40] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>"
[41] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[42] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[43] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[44] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>"
[45] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[46] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[47] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[48] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>"
[49] "<td class=\"tosenkaisu\"><div>1<span>回</span></div></td>"
[50] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[51] "<td class=\"tosenkaisu\"><div>9<span>回</span></div></td>"
[52] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[53] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[54] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>"
[55] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[56] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[57] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[58] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[59] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[60] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[61] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[62] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[63] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[64] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[65] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[66] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>"
[67] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[68] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[69] "<td class=\"tosenkaisu\"><div>13<span>回</span></div></td>"
[70] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[71] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[72] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[73] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>"
[74] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[75] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[76] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[77] "<td class=\"tosenkaisu\"><div>2<span>回</span></div></td>"
[78] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[79] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>"
[80] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[81] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[82] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[83] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>"
[84] "<td class=\"tosenkaisu\"><div>4<span>回</span></div></td>"
[85] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[86] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[87] "<td class=\"tosenkaisu\"><div>3<span>回</span></div></td>"
[88] "<td class=\"tosenkaisu\"><div>1<span>回</span></div></td>"
[89] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[90] "<td class=\"tosenkaisu\"><div>5<span>回</span></div></td>"
[91] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[92] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[93] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[94] "<td class=\"tosenkaisu\"><div>6<span>回</span></div></td>"
[95] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
[96] "<td class=\"tosenkaisu\"><div>8<span>回</span></div></td>"
[97] "<td class=\"tosenkaisu\"><div>0<span>回</span></div></td>"
・gsub
関数を使って、当選回数以外の情報を消し、tokyo_previous と名前をつける。
tokyo_previous <- gsub("<td class=\"tosenkaisu\"><div>(.*)<span>回</span></div></td>","\\1", tokyo[previous])
tokyo_previous
[1] "7" "3" "0" "0" "0" "0" "3" "0" "0" "4" "7" "0" "5" "1"
[15] "0" "0" "4" "4" "3" "2" "4" "0" "0" "7" "4" "0" "0" "10"
[29] "0" "2" "0" "0" "0" "6" "0" "0" "0" "2" "0" "2" "0" "0"
[43] "0" "8" "0" "0" "0" "8" "1" "0" "9" "0" "0" "6" "0" "0"
[57] "0" "0" "3" "4" "0" "0" "3" "3" "3" "8" "0" "0" "13" "3"
[71] "0" "4" "6" "0" "0" "4" "2" "0" "6" "3" "0" "0" "8" "4"
[85] "0" "0" "3" "1" "0" "5" "0" "0" "0" "6" "0" "8" "0"
・朝日新聞2017総選挙(東京選挙区)のホームページの「ソースを表示」を見ると、集めたい候補者の「身分」は status の後ろにあることがわかる。
・grep()
関数を使って、tokyo から候補者の「身分」データだけを選別し status と名前をつける。
status <- grep("<td class=\"status\"", tokyo)
tokyo[status]
[1] "<td class=\"status\"><div>元</div></td>"
[2] "<td class=\"status\"><div>前</div></td>"
[3] "<td class=\"status\"><div>新</div></td>"
[4] "<td class=\"status\"><div>新</div></td>"
[5] "<td class=\"status\"><div>新</div></td>"
[6] "<td class=\"status\"><div>新</div></td>"
[7] "<td class=\"status\"><div>前</div></td>"
[8] "<td class=\"status\"><div>新</div></td>"
[9] "<td class=\"status\"><div>新</div></td>"
[10] "<td class=\"status\"><div>前</div></td>"
[11] "<td class=\"status\"><div>前</div></td>"
[12] "<td class=\"status\"><div>新</div></td>"
[13] "<td class=\"status\"><div>前</div></td>"
[14] "<td class=\"status\"><div>元</div></td>"
[15] "<td class=\"status\"><div>新</div></td>"
[16] "<td class=\"status\"><div>新</div></td>"
[17] "<td class=\"status\"><div>前</div></td>"
[18] "<td class=\"status\"><div>元</div></td>"
[19] "<td class=\"status\"><div>前</div></td>"
[20] "<td class=\"status\"><div>前</div></td>"
[21] "<td class=\"status\"><div>前</div></td>"
[22] "<td class=\"status\"><div>新</div></td>"
[23] "<td class=\"status\"><div>新</div></td>"
[24] "<td class=\"status\"><div>前</div></td>"
[25] "<td class=\"status\"><div>前</div></td>"
[26] "<td class=\"status\"><div>新</div></td>"
[27] "<td class=\"status\"><div>新</div></td>"
[28] "<td class=\"status\"><div>前</div></td>"
[29] "<td class=\"status\"><div>新</div></td>"
[30] "<td class=\"status\"><div>前</div></td>"
[31] "<td class=\"status\"><div>新</div></td>"
[32] "<td class=\"status\"><div>新</div></td>"
[33] "<td class=\"status\"><div>新</div></td>"
[34] "<td class=\"status\"><div>前</div></td>"
[35] "<td class=\"status\"><div>新</div></td>"
[36] "<td class=\"status\"><div>新</div></td>"
[37] "<td class=\"status\"><div>新</div></td>"
[38] "<td class=\"status\"><div>前</div></td>"
[39] "<td class=\"status\"><div>新</div></td>"
[40] "<td class=\"status\"><div>前</div></td>"
[41] "<td class=\"status\"><div>新</div></td>"
[42] "<td class=\"status\"><div>新</div></td>"
[43] "<td class=\"status\"><div>新</div></td>"
[44] "<td class=\"status\"><div>前</div></td>"
[45] "<td class=\"status\"><div>新</div></td>"
[46] "<td class=\"status\"><div>新</div></td>"
[47] "<td class=\"status\"><div>新</div></td>"
[48] "<td class=\"status\"><div>前</div></td>"
[49] "<td class=\"status\"><div>前</div></td>"
[50] "<td class=\"status\"><div>新</div></td>"
[51] "<td class=\"status\"><div>前</div></td>"
[52] "<td class=\"status\"><div>新</div></td>"
[53] "<td class=\"status\"><div>新</div></td>"
[54] "<td class=\"status\"><div>前</div></td>"
[55] "<td class=\"status\"><div>新</div></td>"
[56] "<td class=\"status\"><div>新</div></td>"
[57] "<td class=\"status\"><div>新</div></td>"
[58] "<td class=\"status\"><div>新</div></td>"
[59] "<td class=\"status\"><div>前</div></td>"
[60] "<td class=\"status\"><div>前</div></td>"
[61] "<td class=\"status\"><div>新</div></td>"
[62] "<td class=\"status\"><div>新</div></td>"
[63] "<td class=\"status\"><div>前</div></td>"
[64] "<td class=\"status\"><div>前</div></td>"
[65] "<td class=\"status\"><div>元</div></td>"
[66] "<td class=\"status\"><div>前</div></td>"
[67] "<td class=\"status\"><div>新</div></td>"
[68] "<td class=\"status\"><div>新</div></td>"
[69] "<td class=\"status\"><div>前</div></td>"
[70] "<td class=\"status\"><div>前</div></td>"
[71] "<td class=\"status\"><div>新</div></td>"
[72] "<td class=\"status\"><div>前</div></td>"
[73] "<td class=\"status\"><div>元</div></td>"
[74] "<td class=\"status\"><div>新</div></td>"
[75] "<td class=\"status\"><div>新</div></td>"
[76] "<td class=\"status\"><div>前</div></td>"
[77] "<td class=\"status\"><div>前</div></td>"
[78] "<td class=\"status\"><div>新</div></td>"
[79] "<td class=\"status\"><div>前</div></td>"
[80] "<td class=\"status\"><div>前</div></td>"
[81] "<td class=\"status\"><div>新</div></td>"
[82] "<td class=\"status\"><div>新</div></td>"
[83] "<td class=\"status\"><div>前</div></td>"
[84] "<td class=\"status\"><div>元</div></td>"
[85] "<td class=\"status\"><div>新</div></td>"
[86] "<td class=\"status\"><div>新</div></td>"
[87] "<td class=\"status\"><div>前</div></td>"
[88] "<td class=\"status\"><div>新</div></td>"
[89] "<td class=\"status\"><div>新</div></td>"
[90] "<td class=\"status\"><div>前</div></td>"
[91] "<td class=\"status\"><div>新</div></td>"
[92] "<td class=\"status\"><div>新</div></td>"
[93] "<td class=\"status\"><div>新</div></td>"
[94] "<td class=\"status\"><div>前</div></td>"
[95] "<td class=\"status\"><div>新</div></td>"
[96] "<td class=\"status\"><div>前</div></td>"
[97] "<td class=\"status\"><div>新</div></td>"
・gsub
関数を使って、候補者の「身分」データ以外の情報を消し、tokyo_status と名前をつける。
tokyo_status <- gsub("<td class=\"status\"><div>(.*)</div></td>","\\1", tokyo[status])
tokyo_status
[1] "元" "前" "新" "新" "新" "新" "前" "新" "新" "前" "前" "新" "前" "元"
[15] "新" "新" "前" "元" "前" "前" "前" "新" "新" "前" "前" "新" "新" "前"
[29] "新" "前" "新" "新" "新" "前" "新" "新" "新" "前" "新" "前" "新" "新"
[43] "新" "前" "新" "新" "新" "前" "前" "新" "前" "新" "新" "前" "新" "新"
[57] "新" "新" "前" "前" "新" "新" "前" "前" "元" "前" "新" "新" "前" "前"
[71] "新" "前" "元" "新" "新" "前" "前" "新" "前" "前" "新" "新" "前" "元"
[85] "新" "新" "前" "新" "新" "前" "新" "新" "新" "前" "新" "前" "新"
・抜き出した 6 つのデータをデータフレームに納め df.hr.tokyo と名前をつける。
df.hr.tokyo <- data.frame(name = tokyo_name,
age = tokyo_age,
count = tokyo_count,
party = tokyo_party,
status = tokyo_status,
previous = tokyo_previous)
df.hr.tokyo
name age count party status previous
1 海江田万里 68 96255 立憲 元 7
2 山田美樹 43 93234 自民 前 3
3 松沢香 39 40376 希望 新 0
4 原口実季 28 3806 諸派 新 0
5 犬丸光加 57 1570 諸派 新 0
6 又吉光雄 73 1307 諸派 新 0
7 辻清人 38 112993 自民 前 3
8 松尾明弘 42 91230 立憲 新 0
9 鳩山太郎 43 41955 希望 新 0
10 石原宏高 53 107708 自民 前 4
11 松原仁 61 94380 希望 前 7
12 香西克介 41 45088 共産 新 0
13 平将明 50 115239 自民 前 5
14 井戸正枝 51 53480 立憲 元 1
15 難波美智代 43 35352 希望 新 0
16 青山昂平 26 26037 共産 新 0
17 若宮健嗣 56 101314 自民 前 4
18 手塚仁雄 51 99182 立憲 元 4
19 福田峰之 53 45737 希望 前 3
20 落合貴之 38 100400 立憲 前 2
21 越智隆雄 53 98422 自民 前 4
22 植松恵美子 49 42862 希望 新 0
23 中岡茉妃 26 4307 諸派 新 0
24 長妻昭 57 117118 立憲 前 7
25 松本文明 68 85305 自民 前 4
26 荒木章博 64 25531 希望 新 0
27 井上郁磨 26 3850 無所 新 0
28 石原伸晃 60 99863 自民 前 10
29 吉田晴美 45 76283 立憲 新 0
30 木内孝胤 51 41175 希望 前 2
31 長内史子 29 22399 共産 新 0
32 円より子 70 11997 無所 新 0
33 斎藤郁真 29 2931 諸派 新 0
34 菅原一秀 55 122279 自民 前 6
35 高松智之 43 64731 希望 新 0
36 原純子 53 57439 共産 新 0
37 前田吉成 62 4243 無所 新 0
38 鈴木隼人 40 91146 自民 前 2
39 鈴木庸介 41 70168 立憲 新 0
40 若狭勝 60 57901 希望 前 2
41 岸良信 62 20828 共産 新 0
42 小山徹 42 2107 無所 新 0
43 吉井利光 35 1744 諸派 新 0
44 下村博文 63 104612 自民 前 8
45 前田順一郎 42 60291 立憲 新 0
46 宍戸千絵 39 42668 希望 新 0
47 小堤東 28 25426 共産 新 0
48 太田昭宏 72 112597 公明 前 8
49 池内沙織 35 83544 共産 前 1
50 中村勝 66 21892 諸派 新 0
51 鴨下一郎 68 120744 自民 前 9
52 北條智彦 34 67070 立憲 新 0
53 祖父江元希 42 30807 共産 新 0
54 松島みどり 61 104137 自民 前 6
55 矢作麻子 39 63235 希望 新 0
56 阿藤和之 46 46600 共産 新 0
57 清井美穂 54 4282 諸派 新 0
58 大塚紀久雄 76 3607 無所 新 0
59 秋元司 46 101155 自民 前 3
60 柿沢未途 46 70325 希望 前 4
61 吉田年男 69 34943 共産 新 0
62 猪野隆 52 15667 無所 新 0
63 大西英男 71 84457 自民 前 3
64 初鹿明博 48 71405 立憲 前 3
65 田村謙治 49 50568 希望 元 3
66 平沢勝栄 72 127632 自民 前 8
67 西田主税 55 49485 希望 新 0
68 新井杉生 58 43138 共産 新 0
69 菅直人 71 96713 立憲 前 13
70 土屋正忠 75 95667 自民 前 3
71 鴇田敦 51 45081 希望 新 0
72 松本洋平 44 96229 自民 前 4
73 末松義規 60 90540 立憲 元 6
74 佐々木里加 50 29743 希望 新 0
75 杉下茂雄 68 17377 共産 新 0
76 木原誠二 47 107686 自民 前 4
77 宮本徹 45 57741 共産 前 2
78 鹿野晃 44 50439 希望 新 0
79 長島昭久 55 92356 希望 前 6
80 小田原潔 53 88225 自民 前 3
81 小糸健介 35 38195 社民 新 0
82 天木直人 70 6655 諸派 新 0
83 伊藤達也 56 110493 自民 前 8
84 山花郁夫 50 91073 立憲 元 4
85 金ケ崎絵美 41 30236 希望 新 0
86 阿部真 43 22859 共産 新 0
87 小倉将信 36 110522 自民 前 3
88 伊藤俊輔 38 76450 希望 新 1
89 松村亮佑 37 58929 共産 新 0
90 萩生田光一 54 122331 自民 前 5
91 高橋斉久 44 61441 立憲 新 0
92 吉羽美華 37 39892 希望 新 0
93 飯田美弥子 57 24349 共産 新 0
94 井上信治 48 112014 自民 前 6
95 山下容子 58 44884 立憲 新 0
96 小沢鋭仁 63 38286 希望 前 8
97 井上宣 43 21031 共産 新 0
write.csv(df.hr.tokyo, "hr2017_tokyo.csv",
fileEncoding = "CP932")