Copy File from Cloud HDFS to Local Computer

While I work with big data technologies like Spark and a large dataset I like to work on the university cloud, where everything is faster. However, for different reasons sometimes I have to move to local computer (my laptop). This time the reason is, I need to use a package of Python matplotlib, named baseplot, which is not installed on the cloud. However, the data I need to work on is on the cloud HDFS. Therefore, I need to copy the data from HDFS to my local laptop. This can be done in two simple steps:

Step 1: copy data from HDFS to remote local (not HDFS)
Step 2: copy data from remote local to local (my laptop)

I have this file named tmax-1 on the HDFS. I can check that like this:

abc@my-cloud:~$ hdfs dfs -ls /courses/732/tmax-1
Found 3 items
-rw-r--r--   2 ggbaker hdfs          0 2017-11-04 09:54 /courses/123/tmax-1/_SUCCESS
-rw-r--r--   2 ggbaker hdfs       6852 2017-11-04 09:54 /courses/123/tmax-1/part-00000-77a0f6ea-9bd1-4d00-88f2-d833e4d6d0b4-c000.csv.gz
-rw-r--r--   2 ggbaker hdfs      10318 2017-11-04 09:54 /courses/123/tmax-1/part-00001-77a0f6ea-9bd1-4d00-88f2-d833e4d6d0b4-c000.csv.gz

Step 1: Now copy the file from HDFS to remote computers local

abc@my-cloud:~$ hdfs dfs -copyToLocal /courses/123/tmax-1 tmax-1

Step 2: Now copy that file to local (laptop) using scp command

shanto@shanto:~/Desktop/BigData$ scp -r data/tmax-1's password: 
part-00009-fc2926ee-4904-4e51-bc4f-3d9add9f23e3-c000.csv.gz                         100%  274KB 273.5KB/s   00:00    

... ... ...

part-00042-fc2926ee-4904-4e51-bc4f-3d9add9f23e3-c000.csv.gz                         100%  301KB 301.2KB/s   00:01    
part-00028-fc2926ee-4904-4e51-bc4f-3d9add9f23e3-c000.csv.gz                         100%  301KB 300.6KB/s   00:00    
part-00010-fc2926ee-4904-4e51-bc4f-3d9add9f23e3-c000.csv.gz                         100%  279KB 279.0KB/s   00:00    
part-00023-fc2926ee-4904-4e51-bc4f-3d9add9f23e3-c000.csv.gz                         100%  293KB 292.8KB/s   00:00    
_SUCCESS                                                                            100%    0     0.0KB/s   00:00    
part-00033-fc2926ee-4904-4e51-bc4f-3d9add9f23e3-c000.csv.gz                         100%  305KB 305.2KB/s   00:00    
part-00016-fc2926ee-4904-4e51-bc4f-3d9add9f23e3-c000.csv.gz                         100%  272KB 272.0KB/s   00:00    

Now the files should be copied to my laptop. Here I check using ls.

shanto@shanto:~/Desktop/BigData$ ls data/
tmax-1  tmax-2

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s