学习大数据

学习大数据

Python同步练习
Created2023-02-11|Python
Python概述 查看Python版本 安装numpy 卸载numpy 查看已安装扩展库列表 编程规范修正以下代码,让其正确执行。 1. 1234567a = 99if a<60:print("成绩不及格")elif a<80:print("成绩中等")elif a<100:print("成绩优秀") 2. 12for i in [1,2,3]:print(i) 3. 12345def sum(a,b):return a+bs = sum(4,5)print(s) 4. 12345678class Dog:def __init__(self, n):self.name = ndef say(self): print(f"我的名字叫{self.name}")print("wang wang ~")dudu = Dog("嘟嘟")dudu.say() 数据类型 创建一个变量 d, 值为 -10 创建一个变量 f...
Spark SQL
Created2023-01-15|Spark
Spark DataFrame & Spark SQL环境初始化123val conf = new SparkConf().setAppName("Spark SQL").setMaster("local[*]")val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._ 测试数据spark项目 - org.apache.spark.spark-examples_2.11模块下自带的测试数据。 DataFrame构造123456789import org.apache.spark.sql._import org.apache.spark.sql.types._val sparkSession = new org.apache.spark.sql.SparkSession(sc)val schema =StructType(StructField("name", StringType, false)::...
Hadoop环境部署-单机
Created2023-01-15|Hadoop
单机 官方文档 https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation 安装下载1wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz 复制到安装目录1[zhangsan@node0 ~]$ cp hadoop-2.7.3.tar.gz /opt/bigdata/hadoop/ 解压12[zhangsan@node0 ~]$ cd /opt/bigdata/hadoop/[zhangsan@node0 hadoop]$ tar -zxf hadoop-2.7.3.tar.gz 创建软连接12345[zhangsan@node0 hadoop]$ ln -s hadoop-2.7.3 default[zhangsan@node0 hadoop]$ lllrwxrwxrwx. 1 zhangsan...
Hadoop环境部署-伪分布式
Created2023-01-15|Hadoop
伪分布式 官方文档 https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation 免密登录未配置免密登录通过ssh工具登录node0的时候,会让你输入密码。 12345678[zhangsan@node0 ~]$ ssh node0The authenticity of host 'node0 (192.168.179.100)' can't be established.ECDSA key fingerprint is SHA256:1+3DDeEwkWu0zRO1RoxISbQoKTSgZ56QO3Rl4XXteTw.ECDSA key fingerprint is MD5:92:c9:cd:4a:b8:07:29:ff:3d:25:1c:45:db:8b:5f:dc.Are you sure you want to continue connecting (yes/n...
Hadoop环境部署-完全分布式
Created2023-01-15|Hadoop
完全分布式 官方文档 https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ClusterSetup.html https://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-common/ClusterSetup.html 节点克隆与免密登录将预配置好的机器克隆:机器名为node1, node2, node3,根据VMware网络信息配置合适IP地址,比如。 IP node1 192.168.179.101 node2 192.168.179.102 node3 192.168.179.103 Hostname修改略。 IP修改略。 配置hosts/etc/hosts (对三个节点都做如下配置) 123456127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localh...
Hadoop Windows Dev Env
Created2023-01-15|Hadoop
Windows开发环境搭建MavenMaven下载1https://archive.apache.org 直接解压已经提供的压缩包即可 Maven配置MAVEN_HOME/conf/settings.xml123456789101112<!-- 本地仓库: 根据自己的情况修改保存位置 --><localRepository>D:\maven-repo</localRepository> <!-- 远程仓库: aliyun仓库 --><mirrors> <mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <mirrorOf>central</mirrorOf...
Hadoop_MapReduce
Created2023-01-15|Hadoop
xxxxxxxxxx ​                        org.apache.hadoop            hadoop-client            2.7.3                            org.apache.logging.log4j            log4j-api            2.18.0                            org.apache.logging.log4j            log4j-core            2.18.0            ​                                        org.apache.maven.plugins                maven-jar-plugin                2.6                                                                                        cn....
Hadoop Partitioner
Created2023-01-15|Hadoop
分区计算Employee123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148package cn.studybigdata.hadoop.mapred.dpartition;import org.apache.hadoop.io.Writable;import java.io.DataInput;import java.io.DataOutput;import java....
Hadoop Combiner
Created2023-01-15|Hadoop
CombinerMap1234567891011121314151617181920212223242526272829303132333435363738394041import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;/** * 输入:<0, Text> * hello word * hello bigdata * hello spark word * * * 输出:<word, 1> * * hello 3 * word 2 * spark 1 * bigdata 1 * */ ...
Hadoop GroupBy
Created2023-01-15|Hadoop
分组汇总Mapper1234567891011121314151617181920212223242526package cn.studybigdata.hadoop.mapred.bdeptsalary;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;public class DeptMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> { //7369,SMITH,CLERK,7902,1980/12/17,800,,20 @Override protected void map(LongWritable key, Text value, Co...
1…456…14
avatar
QuZheng
Articles
133
Tags
99
Categories
27
Follow Me
Announcement
This is my Blog
Recent Posts
Kafka基本使用2026-03-15
Flink Hello World2026-03-14
Flink Stream Connect2026-03-14
Flink Stream Union2026-03-14
Flink Introduction2026-03-14
Categories
  • Flink12
  • Flume1
  • HBase9
  • Hadoop21
  • Hive1
  • JavaEE13
  • Kettle11
  • LayUI1
Tags
Phoenix 相关Jar包 MongoDB使用 源码安装Nginx Flink Stream Connect HBase Source Code Linux网络管理 Partitioner Flink Table API Hadoop Java DataInput/DataOutput Linux文件管理 Linux基础知识 Spark环境部署 Python在线编程环境 源码安装Redis 免密登录 Flink Broadcast Windows Dev Env Linux进程管理 防火墙设置 Linux Quartz WordCount Linux基础命令 Tools 大数据项目 Linux系统配置 MapReduce VIM文本编辑器 JavaEE Jpyuter Notebook Kafka Redis Web Framework HDFS综合操作实验 GroupBy Python 乱码
Archives
  • March 2026 17
  • November 2023 2
  • October 2023 1
  • June 2023 7
  • May 2023 6
  • March 2023 6
  • February 2023 2
  • January 2023 36
Website Info
Article Count :
133
Unique Visitors :
Page Views :
Last Update :
© 2025 - 2026 By QuZhengFramework Hexo 8.1.1|Theme Butterfly 5.5.4