克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

PySpark Tutorial

  • PySpark is the Python API for Spark.

  • The purpose of PySpark tutorial is to provide basic distributed algorithms using PySpark.

  • PySpark supports two types of Data Abstractions:

    • RDDs
    • DataFrames
  • PySpark Interactive Mode: has an interactive shell ($SPARK_HOME/bin/pyspark) for basic testing and debugging and is not supposed to be used for production environment.

  • PySpark Batch Mode: you may use $SPARK_HOME/bin/spark-submit command for running PySpark programs (may be used for testing and production environemtns)


Glossary: big data, MapReduce, Spark


Basics of PySpark with Examples


PySpark Examples and Tutorials


Books

Data Algorithms with Spark

Data Algorithms

PySpark Algorithms


Miscellaneous

Download, Install Spark and Run PySpark

How to Minimize the Verbosity of Spark


PySpark Tutorial and References...


Questions/Comments

Thank you!

best regards,
Mahmoud Parsian

Data Algorithms with Spark Data Algorithms with Spark PySpark Algorithms Data Algorithms
Copyright [2019] [Mahmoud Parsian] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

暂无描述 展开 收起
Jupyter Notebook 等 3 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化