【SQL】 ロング型からワイド型へのデータ変換②(Data conversion from long to wide type(2))

①の続き。Flourish での可視化のために、データの形を変えることにした。SQLで書く場合の記述です。

  • 使用データ:[ Google Cloud Public Datasets] chicago_crime_data (Chicago Police Department crime data from 2001 to present)
  • 期間:2010-01-01 ~ 2020-12-31(10年間)
  • 「年」×「犯罪タイプ」×「件数」のロングデータ


現状データ(過去記事参照)を、「年」×「犯罪タイプ件数」のワイドデータに変換します。


【SQL】一部

 SELECT
 year, MAX(sum_year) AS sum_year_total,
 MAX(case primary_type when 'THEFT' then primary_count else null end) as THEFT_count, MAX(case primary_type when 'BATTERY' then primary_count else null end) as BATTERY_count, MAX(case primary_type when 'NARCOTICS' then primary_count else null end) as NARCOTICS_count,  MAX(case primary_type when 'CRIMINAL DAMAGE' then primary_count else null end) as CRIMINALDAMAGE_count,  MAX(case primary_type when 'BURGLARY' then primary_count else null end) as BURGLARY_count 
FROM( SELECT year, primary_type, primary_count, sum_year 
 FROM total_count JOIN type_year USING(year) 
 ORDER BY year, primary_count DESC

)

GROUP BY year

ORDER BY year

太字をご覧ください。max関数を、犯罪タイプの数ぶん書かないといけません。36タイプくらい?いや無理、手書き怖い、ミスの元!

SQLってロングに強いですよね…だから狙い定めないと煩雑になるんですよ…。


【結果】

これがFlourish可視化のために欲しいデータのかたちです。犯罪タイプが右に並んでいく流れですね。今回は上位5位でやめました。


「PIVOT/UNPIVOT」なる縦横変換関数があるようだけど、練習してないので断念。どっちにしろ、SQLでは表頭をSELECTで指示しないといけないので、犯罪タイプの数だけ長くなること間違いなし。

BigQuery Query syntax :PIVOT operator:PIVOT演算子は、集約を使って行を列に回転させる。PIVOTはFROM句の一部です。

こういうことをしなくていい為のBIツール、目的地到達への手段は複数持っておく。③につづく。



********


Continuation of (1). for visualisation in Flourish, we decided to change the form of the data. this is the description for writing in SQL.


Convert current data (see previous article) into 'year' x 'number of crime types' wide data.

[SQL] Partially

 SELECT
 year, MAX(sum_year) AS sum_year_total,
 MAX(case primary_type when 'THEFT' then primary_count else null end) as THEFT_count, MAX(case primary_type when 'BATTERY' then primary_count else null end) as BATTERY_count, MAX(case primary_type when 'NARCOTICS' then primary_count else null end) as NARCOTICS_count,  MAX(case primary_type when 'CRIMINAL DAMAGE' then primary_count else null end) as CRIMINALDAMAGE_count,  MAX(case primary_type when 'BURGLARY' then primary_count else null end) as BURGLARY_count
 FROM(
 SELECT year, primary_type, primary_count, sum_year 
 FROM total_count JOIN type_year USING(year) 

 ORDER BY year, primary_count DESC

)

GROUP BY year

ORDER BY year

See bold, you have to write the max function for the number of crime types, about 36 types? No, I can't, I'm afraid to write by hand, it's a source of mistakes!

SQL is very long... so if you don't aim it, it gets complicated...


【Result】

This is the form of data you want for the Flourish visualisation. It's the flow of crime types right next to each other. This time I will stop at the top five.

There seems to be a 'PIVOT/UNPIVOT' vertical/horizontal conversion function in SQL, but I haven't practised it, so I've given up. Anyway, in SQL you have to use SELECT to specify the table header, so it will definitely be as long as the number of crime types.


BigQuery Query syntax :PIVOT operator

The PIVOT operator rotates rows into columns, using aggregation. PIVOT is part of the FROM clause.


BI tools to avoid having to do this, and have multiple means of reaching your destination. Continued in (3).