Swift integrationfrom: https://savanna. Hadoop and Swift integration is the essential continuation of Hadoop&OpenStack marriage. There were two steps to achieve this:
Swift patchingIf you are still using Folsom you need to follow these steps:
[pipeline:main]
pipeline = catch_errors healthcheck cache ratelimit swift3 s3token list_endpoints authtoken keystone proxy-server
The next thing you need to do here is to add the description of new filter: [filter:list_endpoints]
use = egg:swift#${list_endpoints}
# list_endpoints_path = /endpoints/
list_endpoints_path is not mandatory and is “endpoints” by default. This param is used for http-request construction. See details below.
${list_endpoints} = swift.common.middleware.list_endpoints:filter_factory
Is Swift was patched successfully?You may check if patching is successful just sending the following http requests: http://${proxy}:8080/endpoints/${account}/${container}/${object}
http://${proxy}:8080/endpoints/${account}/${container}
http://${proxy}:8080/endpoints/${account}
You don’t need any additional headers here and authorization (see previous section: filter ${list_endpoints} is before 'authtoken’ filter). The response will contain ip’s of all swift nodes which contains the corresponding object. Hadoop patchingYou may build jar file by yourself choosing the latest patch fromhttps://issues./jira/browse/HADOOP-8545. Or you may get the latest one from repositoryhttps://github.com/stackforge/savanna-extra/blob/master/hadoop-swift/hadoop-swift-latest.jar You need to put this file to hadoop libraries (e.g. /usr/lib/share/hadoop/lib) into each job-tracker and task-tracker node in cluster. The main step in this section is to configure core-site.xml file on each of this node. Hadoop configurationsAll of configs may be rewritten by Hadoop-job or set in core-site.xml using this template: <property>
<name>${name} + ${config}</name>
<value>${value}</value>
<description>${not mandatory description}</description>
</property>
There are two types of configs here:
ExampleBy this point Swift and Hadoop is ready for use. All configs in hadoop is ok. In example below provider’s name is savanna. So let’s copy one object to another in one swift container and account. E.g. /dev/integration/temp to /dev/integration/temp1. Will use distcp for this purpose:http://hadoop./docs/r0.19.0/distcp.html How to write swift path? In our case it will look as follows: swift://integration.savanna/temp. So the template is: swift://${container}.${provider}/${object}. We don’t need to point out the account because it will be automatically determined from tenant name from configs. Actually, account=tenant. Let’s run the job: $ hadoop distcp -D fs.swift.service.savanna.username=admin \
-D fs.swift.service.savanna.password=swordfish \
swift://integration.savanna/temp swift://integration.savanna/temp1
After that just check if temp1 is created. LimitationsNote: Please note that container name should be a valid URI. |
|