学习大数据

Created2022-10-14|Kettle

Kettle - 基于触发器的CDC相关表student_cdc12345678910111213141516171819202122232425262728293031323334353637383940414243-- ------------------------------ Table structure for student_cdc-- ----------------------------DROP TABLE IF EXISTS `student_cdc`;CREATE TABLE `student_cdc` ( `学号` int(255) NOT NULL AUTO_INCREMENT, `姓名` varchar(255) DEFAULT NULL, `性别` varchar(255) DEFAULT NULL, `班级` varchar(255) DEFAULT NULL, `年龄` varchar(255) DEFAULT NULL, `成绩` varchar(255) DEFAULT NULL, `身高` varchar(255) DEFA...

Scala安装与基本使用

Created2022-10-02|Scala

Scala简介 Scala运行于Java虚拟机（JVM）之上，并且兼容现有的Java程序 Scala是一门纯粹的面向对象的语言 Scala也是一门函数式编程语言 Scala安装Scala依赖于Java虚拟机，首先需要安装与系统匹配的JDK，此处省略了JDK的安装。 Scala下载我使用的Linux发行版是CentOS7，下载的是rpm 包； Windows下载的是msi包。如果下载的是scala压缩包，需要把SCALA_HOME/bin追加到PATH环境变量中。 1https://www.scala-lang.org/download/2.12.16.html Linux安装12[root@node0 ~]# chmod +x scala-2.12.16.rpm [root@node0 ~]# rpm -i scala-2.12.16.rpm Windows安装直接双击msi安装即可。 Scala基本使用解释器123456789101112131415[root@node0 ~]# scalaWelcome to Scala 2.12.16 (Java HotSp...

Flume

Created2022-03-23|Flume

Flume$FLUME_HOME/conf12[zhangsan@node0 conf]$ mv flume-env.sh.template flume-env.sh[zhangsan@node0 conf]$ mv flume-conf.properties.template flume-conf.properties 配置环境变量Hello Worldsink 到logger配置文件123456789101112131415161718192021# 为agent a1 的各组件命名a1.sources = r1a1.sinks = k1a1.channels = c1# agent a1的一个source在端口44444监听数据a1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444# 将event数据记录到控制台a1.sinks.k1.type = logger# 使用一个channel在内存中缓存eventsa1.channels.c1.type...

HBase集群管理

Created2022-03-23|HBase

HBase集群管理新增Hadoop节点Master slaves新增node4 配置分发( hosts, slaves ) node4启动DataNode和NodeManager 12[zhangsan@node4 default]$ sbin/hadoop-daemon.sh start datanode[zhangsan@node4 default]$ sbin/yarn-daemon.sh start nodemanager 刷新节点状态1[zhangsan@node1 ~]$ hdfs dfsadmin -refreshNodes 负载均衡123456# 设置负载均衡带宽[zhangsan@node1 ~]$ hdfs dfsadmin -setBalancerBandwidth bytes# 磁盘使用率阈值，单位为百分比。此命令会产生一个进程，并在logs目录输出日志；负载均衡完毕，进程结束。# -threshold# 默认值:10%。该值确保每个DataNode上的磁盘使用率与集群整体使用率的差距不超过10%。[zhangsan@node1 ~]$ st...

SQLite Python API

Created2022-03-23|SQLite

导入sqlite31import sqlite3 创建数据库链接1conn = sqlite3.connect("test.db") 获取游标1cursor = conn.cursor() 创建表1cursor.execute("create table user(id int primary key, name varchar(20) )") 插入数据插入一条数据 12cursor.execute('insert into user(id, name) values (1, \'aaa\')')print(cursor.rowcount) 插入多条数据 executemany(sqlstatement, values) 12cursor.executemany('insert into user(id,name) values (?,?)', [(2, 'AAA'), (3, 'BBB')])print(cursor.ro...

HBase与MapReduce

Created2022-03-23|HBase

HBase与MapReduceHBase可以作为MapReduce的输入数据源，也可以作为MapReduce的输出目的地，甚至可以在MapReduce任务过程中使用HBase来共享资源。 12345678(python37) [zhangsan@node0 default]$ bin/hbase mapredcpSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/bigdata/hbase/hbase-1.4.13/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.cl...

HBase 数据查询

Created2022-03-23|HBase

HBase数据查询数据准备创建表创建Student表，两个列族Info和Score； 12hbase(main):047:0> create 'Student','Info','Score'0 row(s) in 2.3500 seconds 插入数据批量插入 No Info Score name age Hadoop HBase Spark 001 qiaofeng 30 93 85 70 002 duanyu 27 95 98 50 003 wangyuyan 18 95 97 92

ZooKeeper

Created2022-03-23|ZooKeeper

ZooKeeper 环境部署官方文档 https://zookeeper.apache.org/doc/r3.4.14/index.html 下载地址 https://archive.apache.org/dist/zookeeper/zookeeper-3.4.14/ 安装目录 /opt/bigdata/zookeeper 部署规划 node0单节点； node1 node2 node3 配置为zookeeper集群。 StandaloneMode 单节点模式 https://zookeeper.apache.org/doc/r3.4.14/zookeeperStarted.html#sc_InstallingSingleMode 下载使用xftp上传安装包或者通过wget等工具在线下载。 123[zhangsan@node0 ~]$ cd /opt/bigdata/zookeeper/# 此处使用wget在线下载[zhangsan@node0 zookeeper]$ wget https://archive.apache.org/dist/zookeeper/zoo...

HBase Source Code

Created2022-03-23|HBase

HBase Source CodeHMaster选举1234567891011121314151617181920212223242526272829303132333435363738394041424344private void startActiveMasterManager(int infoPort) throws KeeperException { String backupZNode = ZNodePaths.joinZNode( zooKeeper.getZNodePaths().backupMasterAddressesZNode, serverName.toString()); /* * Add a ZNode for ourselves in the backup master directory since we * may not become the active master. If so, we want the actual active * master to know we are backu...

HBase

Created2022-03-23|HBase

HBase 环境部署官方文档 https://hbase.apache.org/1.4/book.html 下载地址 https://archive.apache.org/dist/hbase/1.4.13/ 安装目录 /opt/bigdata/hbase/ 配置文件 https://hbase.apache.org/1.4/book.html#_configuration_files 部署规划 node0：Standalone node0：Pseudo-Distributed； node1 node2 node3 配置为hbase集群。 Standalone Standalone https://hbase.apache.org/1.4/book.html#quickstart 准备上传12[zhangsan@node0 bigdata]$ cd /opt/bigdata/[zhangsan@node0 bigdata]$ mkdir hbase 使用FTP工具上传压缩包hbase-1.4.13-bin.tar.gz 到/opt/bigdata/hbase 目录...