加载csv到postgresql

数据转移常用csv平文件,一般加载进数据库需要先创建表,然后加载csv到对应的表里。能不能在表不存在的情况下,加载csv到数据库,自动创建表呢?这对于快速分析很有用。

 

We can use tools like pgfutter to import flat delimiter files into postgres. It will create the table automatically ( all column type is text) or append data to existing table.

Refer to https://github.com/lukasmartinelli/pgfutter/

 

加载csv到postgrsql数据库,如果表存在,追加;如果表不存在,自动创建与文件名同名的表,字段默认都是text类型的。

./pgfutter_linux_amd64 –host “hostname” –port “5432” –db “dbname” –schema “schemaNam”  –user “username” –pw “password” csv All-Rewards.csv

在数据库上,可以做数据类型的转换。可以直接编写查询sql,或者更改字段属性。

su – postgres

psql -h hostname -d dbname -U username -c “alter table schemaName.All_Rewards alter column REWARD_CONCURRENCE type integer using (trim(REWARD_CONCURRENCE)::integer);”

\q

或者在SQL Client上改:

alter table sao_paulo_20171011 alter column id type integer using (trim(id)::integer);
alter table sao_paulo_20171011 alter column price type integer using (trim(price)::numeric(10));
alter table sao_paulo_20171011 alter column update_time type timestamp  using (trim(case when update_time =” then null else update_time end)::timestamp);

 

postgresql 数据格式参考:

http://www.postgres.cn/docs/9.5/datatype.html
https://www.postgresql.org/docs/9.5/static/datatype.html

 

Linux版本下载:

https://github.com/lukasmartinelli/pgfutter/releases/download/v1.1/pgfutter_linux_amd64

	

R connect to DB via JDBC

 

使用JDBC连接数据库更通用,只要有对应数据库的JDBC driver,有Java环境,基本可以连上所有常见的数据库。所以我倾向于用客户端 Squirrel + jdbc 连接数据库进行一般的数据查询和处理。 R 中连接只需要以下几步。

R连接到Postgresql

查询用 dbGetQuery, DML语句用 dbSendQuery, DDL语句用 dbSendUpdate

library(RJDBC)
pDriver <- JDBC(driverClass=”org.postgresql.Driver”, classPath=”C:/Squirrel3.0/lib/postgresql-9.4.1211.jre6.jar”)
#pDriver <- JDBC(driverClass=”org.postgresql.util.PGJDBCMain”, classPath=”C:\\Squirrel3.0\\lib\\postgresql-9.4.1211.jre6.jar”)
pconn <- dbConnect(pDriver, “jdbc:postgresql://postgresql_svr_name_ip:5432/databasename?autosave=always”, “dbuser”, “password”)
input.data <- dbGetQuery(pconn, “select   * from pg_tables limit 10”)
dbDisconnect(pconn) #可有可无

========================

R连接到Vertica

library(RJDBC)
vDriver <- JDBC(driverClass=”com.vertica.jdbc.Driver”, classPath=”C:/Squirrel3.0/lib/vertica-jdbc-7.0.1-0.jar”)
vconn <- dbConnect(vDriver, “jdbc:vertica://vertica_svr_name_ip:5433/databasename”, “dbuser”, “password”)
input.data <- dbGetQuery(vconn, “select * from tables limit 10”)

========================

R连接到Oracle

library(RJDBC)
oDriver <- JDBC(driverClass=”oracle.jdbc.OracleDriver”, classPath=”C:\\Squirrel3.0\\lib\\ojdbc6.jar”)
oconn <- dbConnect(oDriver, “jdbc:oracle:thin:@oracle_svr_name_ip:1521:databasename”, “dbuser”, “password”)
input.data <- dbGetQuery(oconn,”select * from all_tables limit 10″)

========================

R连接到Mysql

mDriver <- JDBC(driverClass=”com.mysql.jdbc.Driver”, classPath=”C:/Squirrel3.0/lib/mysql-connector-java-5.1.46.jar”)
mconn <- dbConnect(mDriver, “jdbc:mysql://mysql_svr_name_ip:3306/databasename”, “dbuser”, “password”)
input.data <- dbGetQuery(mconn, “select * from information_schema.tables limit 10”)

========================

R连接到Microsoft Sqlserver

兼容mssql 2003之前的版本

mssDriver <- JDBC(driverClass=”com.microsoft.sqlserver.jdbc.SQLServerDriver”, classPath=”C:/Squirrel3.0/lib/mssqljdbc41.jar”)
mssconn <- dbConnect(mssDriver , “jdbc:sqlserver://mssql_svr_name_ip:1433;databaseName=databasename”, “dbuser”, “password”)
input.data <- dbGetQuery(mssconn , “select * from sysobjects limit 10”)

========================

R连接到Teradata

tDriver<- JDBC(driverClass=”com.teradata.jdbc.TeraDriver”, classPath=”C:/Squirrel3.0/lib/terajdbc4.jar;C:/Squirrel3.0/lib/tdgssconfig.jar”)
tconn <- dbConnect(tDriver, “jdbc:teradata://Tera_svr_name_ip/TMODE=TERA”, “username”, “password”)
input.data <- dbGetQuery(tconn,”select top 10 * from dbc.dbcinfo”)

 

所用到的jar包都在Squerriel/lib目录下。 下载 [小松鼠] Squirrel SQL Client Portable: https://1drv.ms/u/s!AtVaUU1SN60KhTXulW5JdrTNGF0E