分享

hadoop 源码分析(二)HDFS nameNode 之 FSNamesystem初始化源码分析之加载fsImage 和 edits log

 印度阿三17 2021-01-16

上一篇 讲解了nameNode启动的时候,NameNodeHttpServer的启动流程,其实简单来说就是基于hadoop自己实现的HttpServer2服务绑定一个InetSokcetAddress地址,也就是端口号,端口号哪来的?默认配置文件获取呗,最后在将HttpServer2中绑定一些servlet来处理url请求就完成了我们50070端口的请求处理。

那么本篇分析nameNode进程第二个比较核心的,应该说是最核心的组件 FSNamesystem,为什么说是最核心组件呢,因为元数据管理和block的管理都在这个里面进行操作。

直接进入正题,先回顾一下fsimage(全量快照) 文件 以及 edits log(增量事务记录) 文件,自己回忆下这两个文件的重要性。

找到FSNamesystem 的入口

找到之前的nameNode初始化的地方,里面的loadNamesystem(conf)方法,这个就是核心组件FSNamesystem 构建的入口

 /**
   * Initialize name-node.
   * 
   * @param conf the configuration
   */
  protected void initialize(Configuration conf) throws IOException {
    //其实这些配置 就是对应着我们配置的 hdfs-site.xml 或者 core-default.xml或者其他文件中一些配置信息
    //这些一般也不是很重要
    if (conf.get(HADOOP_USER_GROUP_METRICS_PERCENTILES_INTERVALS) == null) {
      String intervals = conf.get(DFS_METRICS_PERCENTILES_INTERVALS_KEY);
      if (intervals != null) {
        conf.set(HADOOP_USER_GROUP_METRICS_PERCENTILES_INTERVALS,
          intervals);
      }
    }

    UserGroupInformation.setConfiguration(conf);
    loginAsNameNodeUser(conf);

    NameNode.initMetrics(conf, this.getRole());
    StartupProgressMetrics.register(startupProgress);
    //如果是nameNode  启动一个httpServer
    if (NamenodeRole.NAMENODE == role) {
      startHttpServer(conf);
    }

    this.spanReceiverHost = SpanReceiverHost.getInstance(conf);

    // 初始化FSNameSystem 核心组件
    loadNamesystem(conf);

    //初始化rpc server 组件
    rpcServer = createRpcServer(conf);
    if (clientNamenodeAddress == null) {
      // This is expected for MiniDFSCluster. Set it now using 
      // the RPC server's bind address.
      clientNamenodeAddress = 
          NetUtils.getHostPortString(rpcServer.getRpcAddress());
      LOG.info("Clients are to use "   clientNamenodeAddress   " to access"
            " this namenode/service.");
    }
    //如果是NameNode 设置NameNodeAddress 以及  FsImage
    if (NamenodeRole.NAMENODE == role) {
      httpServer.setNameNodeAddress(getNameNodeAddress());
      httpServer.setFSImage(getFSImage());
    }
    
    pauseMonitor = new JvmPauseMonitor(conf);
    pauseMonitor.start();
    metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);
    //一些公共服务的初始化
    startCommonServices(conf);
  }

这里面会做一些什么操作呢,跟进去看一下

 protected void loadNamesystem(Configuration conf) throws IOException {
    //创建和初始化FSNameSystem
    //nameNode启动的时候 会讲磁盘上的fsimange 文件 以及 edits log 文件读取到内存中进行合并,
    // 合并形成最新的元数据
    //loadFromDisk 就是讲fsimage 和 edits log 文件读取到内存进行合并
    //合并成最新的元数据 就会作为FSNamesystem存放在内存中
    this.namesystem = FSNamesystem.loadFromDisk(conf);
  }

还是原先那样,如果有注释,尽可能的把注释看懂,并且结合自己强大的技术底蕴,大胆的猜测里面做了什么操作。。。 

 /**
   * Instantiates an FSNamesystem loaded from the image and edits
   * directories specified in the passed Configuration.
   * 实例化一个FSNamesystem,怎么实例化呢?加载配置文件中指定的 image 和 edits 文件目录
   * 如果没有指定呢,hadoop肯定自己默认的文件目录
   *
   * @param conf the Configuration which specifies the storage directories
   *             from which to load
   *  包含了存储目录的Configuration,虽然有默认的,至少也要传递过来把,对吧,肯定是通过可识别方式,难不成用眼神?
   *
   * @return an FSNamesystem which contains the loaded namespace
   *  返回一个包含了加载了 namespace 元数据的 FSNamesystem实例
   *
   * @throws IOException if loading fails
   */
  static FSNamesystem loadFromDisk(Configuration conf) throws IOException {

    checkConfiguration(conf);
    //构建FSImage,怎么构建?肯定是从磁盘加载
    //FSImage 就是一个时间点的元数据快照信息,其实也就是元数据信息
    //FSNamesystem.getNamespaceDirs(conf) 获取元数据的目录,肯定是找指定的配置信息,如果没有肯定就是默认值啦
    // 默认值在hdfs-default.xml 以及 hadoop common 中的core-default.xml 文件中
    //file://${hadoop.tmp.dir}/dfs/name  ${hadoop.tmp.dir}:/tmp/hadoop-${user.name}
    //可以自己观察下启动的namenode进程的这个目录是否和这个匹配
    FSImage fsImage = new FSImage(conf,
        FSNamesystem.getNamespaceDirs(conf),
            //FSNamesystem.getNamespaceEditsDirs(conf)) 获取edits log 的目录,和上面的一样对不对
            //默认情况下 edis log 和namespace 是在同一个目录下,可以进去看下配置信息
        FSNamesystem.getNamespaceEditsDirs(conf));
    //实例化 FSnamesystem 对象,讲fsImage对象放入到了FSNamesystem中
    FSNamesystem namesystem = new FSNamesystem(conf, fsImage, false);
    StartupOption startOpt = NameNode.getStartupOption(conf);
    if (startOpt == StartupOption.RECOVER) {
      namesystem.setSafeMode(SafeModeAction.SAFEMODE_ENTER);
    }

    long loadStart = now();
    try {
      //这里就是说通过FSNamesystem 将fsimage 以及 edits log 加载到内存中
      //然后再内存中合并两个文件,形成新的fsImange 信息
      //(注:默认情况下 每隔一段时间 就会有check point 将旧的fsimage 与 edits log
      // 就行合并形成新的fsimage文件,启动的时候肯定也需要合并才能形成新的fsimage文件 对吧)
      //最后再内存中持有一份完整的元数据信息
      namesystem.loadFSImage(startOpt);
    } catch (IOException ioe) {
      LOG.warn("Encountered exception loading fsimage", ioe);
      fsImage.close();
      throw ioe;
    }
    long timeTakenToLoadFSImage = now() - loadStart;
    LOG.info("Finished loading FSImage in "   timeTakenToLoadFSImage   " msecs");
    NameNodeMetrics nnMetrics = NameNode.getNameNodeMetrics();
    if (nnMetrics != null) {
      nnMetrics.setFsImageLoadTime((int) timeTakenToLoadFSImage);
    }
    return namesystem;
  }

大家可以看到里面做了什么操作呢?

1.构建了一个fsImage对象,注意传入的参数名称,可以发现传入了fsImage 以及edits log,这里大概就猜测出新的fsImage 对象肯定是通过两个合并成新的,但是这里并没有加载数据

2.将新的完整的fsImage元数据信息传递给了FSNamesystem

3.最后加载,意思就是前面只是在初始化一些环境以及配置选项信息以及检查,到这里loadFromDisk才是真正的加载合并

构建FSNamesystem其实里面就是获取一些配置信息以及初始化一些选项,我们简单看一下,里面的内容很多我们主要看下注释 ,看到没,看到没

注释明确说了,不会加载数据,只是构建,如果加载数据使用loadFromDisk,所以注意上面的关联

/**
   * Create an FSNamesystem associated with the specified image.
   * 通过指定的 image 元数据快照创建一个 FSNamesystem对象
   * Note that this does not load any data off of disk -- if you would
   * like that behavior, use {@link #loadFromDisk(Configuration)}
   * 注意 这里不从磁盘加载任何的数据,如果想加载数据,请使用 loadFromDisk(Configuration)
   * 看到没,这里明确说了不加载数据对吧
   * @param conf configuration
   * @param fsImage The FSImage to associate with
   * @param ignoreRetryCache Whether or not should ignore the retry cache setup
   *                         step. For Secondary NN this should be set to true.
   * @throws IOException on bad configuration
   */
  FSNamesystem(Configuration conf, FSImage fsImage, boolean ignoreRetryCache)
      throws IOException {

那么我们就直接看下核心的加载数据的方法 

loadFSImage()方法

里面大概会有哪些操作?

1.肯定就是加载 fsImage 文件 和 edits log 文件 进行合并为一份完整的内存元数据信息fsImage,

2.将新的fsImage 文件进行覆盖

3.那么edits log 文件呢,肯定也会重新刷新一份。

/**
* 肯定就是加载 fsImage 文件 和 edits log 文件 进行合并为一份完整的内存元数据信息,在进行
* 磁盘回写,同时注意哦,edits log 文件肯定也要重新来一份新的
**/
private void loadFSImage(StartupOption startOpt) throws IOException {
    final FSImage fsImage = getFSImage();
    //刚启动的时候 namenode 读取fsimage 以及 edits log 两个文件
    //然后再内存中合并形成一个新的fsiamge文件 包含了完整的最新的元数据信息
    //然后重新存储在磁盘,替换旧的fsimage 文件,同时会打开一个edits log 文件,
    // format before starting up if requested
      //处理下数据,
    if (startOpt == StartupOption.FORMAT) {
      
      fsImage.format(this, fsImage.getStorage().determineClusterId());// reuse current id

      startOpt = StartupOption.REGULAR;
    }
    boolean success = false;
    writeLock();
    try {
      // We shouldn't be calling saveNamespace if we've come up in standby state.
        //不会存储元数据,如果我们启动了一个 standby 服务
      MetaRecoveryContext recovery = startOpt.createRecoveryContext();
      //进行数据加载和 合并
      final boolean staleImage
          = fsImage.recoverTransitionRead(startOpt, this, recovery);
      if (RollingUpgradeStartupOption.ROLLBACK.matches(startOpt) ||
          RollingUpgradeStartupOption.DOWNGRADE.matches(startOpt)) {
        rollingUpgradeInfo = null;
      }
      final boolean needToSave = staleImage && !haEnabled && !isRollingUpgrade(); 
      LOG.info("Need to save fs image? "   needToSave
            " (staleImage="   staleImage   ", haEnabled="   haEnabled
            ", isRollingUpgrade="   isRollingUpgrade()   ")");

      //如果需要将fsImage 存储到磁盘 就调用
        //其实里面就是将最新的
      if (needToSave) {
        fsImage.saveNamespace(this);
      } else {
        updateStorageVersionForRollingUpgrade(fsImage.getLayoutVersion(),
            startOpt);
        // No need to save, so mark the phase done.
        StartupProgress prog = NameNode.getStartupProgress();
        prog.beginPhase(Phase.SAVING_CHECKPOINT);
        prog.endPhase(Phase.SAVING_CHECKPOINT);
      }
      //打开一个新的edits log 文件 去进行写入
      // This will start a new log segment and write to the seen_txid file, so
      // we shouldn't do it when coming up in standby state
      if (!haEnabled || (haEnabled && startOpt == StartupOption.UPGRADE)
          || (haEnabled && startOpt == StartupOption.UPGRADEONLY)) {
        fsImage.openEditLogForWrite();
      }
      success = true;
    } finally {
      if (!success) {
        fsImage.close();
      }
      writeUnlock();
    }
    imageLoadComplete();
  }

我们可以看到除了一些常规的check 和 formate 那么比较重要的就是加载fsImage 以及 edits log 文件的方法入口 

fsImage.recoverTransitionRead(startOpt, this, recovery);

大致想想里面会做什么操作?就是将fsImage 文件 和 edits log 文件加载并合并

 /**
   * Analyze storage directories.
   * Recover from previous transitions if required. 
   * Perform fs state transition if necessary depending on the namespace info.
   * Read storage info. 
   * 分析存储的目录,什么存储目录,就是存储fsImage 以及edits log 的目录、
   * 从以前的状态恢复
   *  根据元信息 判断是否执行fs状态的转换
   *  读取存储的信息
   *  上面意思是什么呢,大概意思就是如果以前有fsImage 和 edits log 就从文件信息中加载出来 并进行恢复
   * @throws IOException
   * @return true if the image needs to be saved or false otherwise
   */
  boolean recoverTransitionRead(StartupOption startOpt, FSNamesystem target,
      MetaRecoveryContext recovery)
      throws IOException {
    assert startOpt != StartupOption.FORMAT : 
      "NameNode formatting should be performed before reading the image";
    //获取fsImage 文件资源地址 其实也就是目录
    Collection<URI> imageDirs = storage.getImageDirectories();
    //获取edits log 目录
    Collection<URI> editsDirs = editLog.getEditURIs();

    // none of the data dirs exist
    if((imageDirs.size() == 0 || editsDirs.size() == 0) 
                             && startOpt != StartupOption.IMPORT)  
      throw new IOException(
          "All specified directories are not accessible or do not exist.");
    
    // 1. For each data directory calculate its state and 
    // check whether all is consistent before transitioning.
    //检查每个数据目录,判断是否状态一致性,什么意思呢,  
    // 进行数据恢复 里面就是对一些之前停机的时候 更新 回滚 新增数据的恢复操作
    Map<StorageDirectory, StorageState> dataDirStates = 
             new HashMap<StorageDirectory, StorageState>();
  
    boolean isFormatted = recoverStorageDirs(startOpt, storage, dataDirStates);

    if (LOG.isTraceEnabled()) {
      LOG.trace("Data dir states:\n  "  
        Joiner.on("\n  ").withKeyValueSeparator(": ")
        .join(dataDirStates));
    }
    
    if (!isFormatted && startOpt != StartupOption.ROLLBACK 
                     && startOpt != StartupOption.IMPORT) {
      throw new IOException("NameNode is not formatted.");      
    }


    int layoutVersion = storage.getLayoutVersion();
    if (startOpt == StartupOption.METADATAVERSION) {
      System.out.println("HDFS Image Version: "   layoutVersion);
      System.out.println("Software format version: "  
        HdfsConstants.NAMENODE_LAYOUT_VERSION);
      return false;
    }

    if (layoutVersion < Storage.LAST_PRE_UPGRADE_LAYOUT_VERSION) {
      NNStorage.checkVersionUpgradable(storage.getLayoutVersion());
    }
    if (startOpt != StartupOption.UPGRADE
        && startOpt != StartupOption.UPGRADEONLY
        && !RollingUpgradeStartupOption.STARTED.matches(startOpt)
        && layoutVersion < Storage.LAST_PRE_UPGRADE_LAYOUT_VERSION
        && layoutVersion != HdfsConstants.NAMENODE_LAYOUT_VERSION) {
      throw new IOException(
          "\nFile system image contains an old layout version " 
            storage.getLayoutVersion()   ".\nAn upgrade to version "
            HdfsConstants.NAMENODE_LAYOUT_VERSION   " is required.\n"
            "Please restart NameNode with the \""
            RollingUpgradeStartupOption.STARTED.getOptionString()
            "\" option if a rolling upgrade is already started;"
            " or restart NameNode with the \""
            StartupOption.UPGRADE.getName()   "\" option to start"
            " a new upgrade.");
    }
    //执行一些启动选项以及一些二更操作
    storage.processStartupOptionsForUpgrade(startOpt, layoutVersion);

    // 2. Format unformatted dirs.
    for (Iterator<StorageDirectory> it = storage.dirIterator(); it.hasNext();) {
      StorageDirectory sd = it.next();
      StorageState curState = dataDirStates.get(sd);
      switch(curState) {
      case NON_EXISTENT:
        throw new IOException(StorageState.NON_EXISTENT   
                              " state cannot be here");
      case NOT_FORMATTED:
        LOG.info("Storage directory "   sd.getRoot()   " is not formatted.");
        LOG.info("Formatting ...");
        sd.clearDirectory(); // create empty currrent dir
        break;
      default:
        break;
      }
    }

    // 3. Do transitions
    switch(startOpt) {
    case UPGRADE:
    case UPGRADEONLY:
      doUpgrade(target);
      return false; // upgrade saved image already
    case IMPORT:
      doImportCheckpoint(target);
      return false; // import checkpoint saved image already
    case ROLLBACK:
      throw new AssertionError("Rollback is now a standalone command, "
            "NameNode should not be starting with this option.");
    case REGULAR:
    default:
      // just load the image
    }
    //真正的加载fsImage 和 edits log 文件进行合并
    return loadFSImage(target, startOpt, recovery);
  }

我们其实又看到了一个loadFSImage 操作,这个里面会是真的加载 数据进行合并?

/**
   * Choose latest image from one of the directories,
   * load it and merge with the edits.
   *
   * 选择最新的 image 全量快照 和 edits log 文件进行合并
   * 哇,终于看到了
   * 
   * Saving and loading fsimage should never trigger symlink resolution. 
   * The paths that are persisted do not have *intermediate* symlinks 
   * because intermediate symlinks are resolved at the time files, 
   * directories, and symlinks are created. All paths accessed while 
   * loading or saving fsimage should therefore only see symlinks as 
   * the final path component, and the functions called below do not
   * resolve symlinks that are the final path component.
   *
   * @return whether the image should be saved
   * @throws IOException
   */
  private boolean loadFSImage(FSNamesystem target, StartupOption startOpt,
      MetaRecoveryContext recovery)
      throws IOException {
    final boolean rollingRollback
        = RollingUpgradeStartupOption.ROLLBACK.matches(startOpt);
    final EnumSet<NameNodeFile> nnfs;
    if (rollingRollback) {
      // if it is rollback of rolling upgrade, only load from the rollback image
      nnfs = EnumSet.of(NameNodeFile.IMAGE_ROLLBACK);
    } else {
      // otherwise we can load from both IMAGE and IMAGE_ROLLBACK
      nnfs = EnumSet.of(NameNodeFile.IMAGE, NameNodeFile.IMAGE_ROLLBACK);
    }
    final FSImageStorageInspector inspector = storage
        .readAndInspectDirs(nnfs, startOpt);

    isUpgradeFinalized = inspector.isUpgradeFinalized();
    List<FSImageFile> imageFiles = inspector.getLatestImages();

    StartupProgress prog = NameNode.getStartupProgress();
    prog.beginPhase(Phase.LOADING_FSIMAGE);
    File phaseFile = imageFiles.get(0).getFile();
    prog.setFile(Phase.LOADING_FSIMAGE, phaseFile.getAbsolutePath());
    prog.setSize(Phase.LOADING_FSIMAGE, phaseFile.length());
    boolean needToSave = inspector.needToSave();

    Iterable<EditLogInputStream> editStreams = null;

    initEditLog(startOpt);

    if (NameNodeLayoutVersion.supports(
        LayoutVersion.Feature.TXID_BASED_LAYOUT, getLayoutVersion())) {
      // If we're open for write, we're either non-HA or we're the active NN, so
      // we better be able to load all the edits. If we're the standby NN, it's
      // OK to not be able to read all of edits right now.
      // In the meanwhile, for HA upgrade, we will still write editlog thus need
      // this toAtLeastTxId to be set to the max-seen txid
      // For rollback in rolling upgrade, we need to set the toAtLeastTxId to
      // the txid right before the upgrade marker.  
      long toAtLeastTxId = editLog.isOpenForWrite() ? inspector
          .getMaxSeenTxId() : 0;
      if (rollingRollback) {
        // note that the first image in imageFiles is the special checkpoint
        // for the rolling upgrade
        toAtLeastTxId = imageFiles.get(0).getCheckpointTxId()   2;
      }
      ///加载edits log 文件
      editStreams = editLog.selectInputStreams(
          imageFiles.get(0).getCheckpointTxId()   1,
          toAtLeastTxId, recovery, false);
    } else {
      editStreams = FSImagePreTransactionalStorageInspector
        .getEditLogStreams(storage);
    }
    int maxOpSize = conf.getInt(DFSConfigKeys.DFS_NAMENODE_MAX_OP_SIZE_KEY,
        DFSConfigKeys.DFS_NAMENODE_MAX_OP_SIZE_DEFAULT);
    for (EditLogInputStream elis : editStreams) {
      elis.setMaxOpSize(maxOpSize);
    }
 
    for (EditLogInputStream l : editStreams) {
      LOG.debug("Planning to load edit log stream: "   l);
    }
    if (!editStreams.iterator().hasNext()) {
      LOG.info("No edit log streams selected.");
    }
    
    FSImageFile imageFile = null;
    for (int i = 0; i < imageFiles.size(); i  ) {
      try {
        imageFile = imageFiles.get(i);
        //记载Image文件
        loadFSImageFile(target, recovery, imageFile, startOpt);
        break;
      } catch (IOException ioe) {
        LOG.error("Failed to load image from "   imageFile, ioe);
        target.clear();
        imageFile = null;
      }
    }
    // Failed to load any images, error out
    if (imageFile == null) {
      FSEditLog.closeAllStreams(editStreams);
      throw new IOException("Failed to load an FSImage file!");
    }
    prog.endPhase(Phase.LOADING_FSIMAGE);
    
    if (!rollingRollback) {
      long txnsAdvanced = loadEdits(editStreams, target, startOpt, recovery);
      needToSave |= needsResaveBasedOnStaleCheckpoint(imageFile.getFile(),
          txnsAdvanced);
      if (RollingUpgradeStartupOption.DOWNGRADE.matches(startOpt)) {
        // rename rollback image if it is downgrade
        renameCheckpoint(NameNodeFile.IMAGE_ROLLBACK, NameNodeFile.IMAGE);
      }
    } else {
      // Trigger the rollback for rolling upgrade. Here lastAppliedTxId equals
      // to the last txid in rollback fsimage.
      rollingRollback(lastAppliedTxId   1, imageFiles.get(0).getCheckpointTxId());
      needToSave = false;
    }
    editLog.setNextTxId(lastAppliedTxId   1);
    return needToSave;
  }

  /** rollback for rolling upgrade. */
  private void rollingRollback(long discardSegmentTxId, long ckptId)
      throws IOException {
    // discard discard unnecessary editlog segments starting from the given id
    this.editLog.discardSegments(discardSegmentTxId);
    // rename the special checkpoint
    renameCheckpoint(ckptId, NameNodeFile.IMAGE_ROLLBACK, NameNodeFile.IMAGE,
        true);
    // purge all the checkpoints after the marker
    archivalManager.purgeCheckpoinsAfter(NameNodeFile.IMAGE, ckptId);
    String nameserviceId = DFSUtil.getNamenodeNameServiceId(conf);
    if (HAUtil.isHAEnabled(conf, nameserviceId)) {
      // close the editlog since it is currently open for write
      this.editLog.close();
      // reopen the editlog for read
      this.editLog.initSharedJournalsForRead();
    }
  }

进行loadFSImanageFile文件的加载,这里就比较枯燥了,为什么?因为我们明显要有大局观,首先我们肯定直到一定是加载文件,对吧?那么文件最后加载到哪里去进行存储?因为我们的FSNamesystem 是核心管理元数据的组件,大家肯定就也想到了,最终数据肯定是加载到其中啦,所以大家注意看这里对啊不,一直有将FSNamesystem 作为参数再进行传递 

/**
   * 这里面就不仔细去详细的跟到文件加载了,
   * loadFSImage()方法就是最终加载文件的方法
   */
  void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery,
      FSImageFile imageFile, StartupOption startupOption) throws IOException {
    LOG.debug("Planning to load image :\n"   imageFile);
    StorageDirectory sdForProperties = imageFile.sd;
    storage.readProperties(sdForProperties, startupOption);

    if (NameNodeLayoutVersion.supports(
        LayoutVersion.Feature.TXID_BASED_LAYOUT, getLayoutVersion())) {
      // For txid-based layout, we should have a .md5 file
      // next to the image file
      boolean isRollingRollback = RollingUpgradeStartupOption.ROLLBACK
          .matches(startupOption);
      loadFSImage(imageFile.getFile(), target, recovery, isRollingRollback);
    } else if (NameNodeLayoutVersion.supports(
        LayoutVersion.Feature.FSIMAGE_CHECKSUM, getLayoutVersion())) {
      // In 0.22, we have the checksum stored in the VERSION file.
      String md5 = storage.getDeprecatedProperty(
          NNStorage.DEPRECATED_MESSAGE_DIGEST_PROPERTY);
      if (md5 == null) {
        throw new InconsistentFSStateException(sdForProperties.getRoot(),
            "Message digest property "  
            NNStorage.DEPRECATED_MESSAGE_DIGEST_PROPERTY  
            " not set for storage directory "   sdForProperties.getRoot());
      }
      loadFSImage(imageFile.getFile(), new MD5Hash(md5), target, recovery,
          false);
    } else {
      // We don't have any record of the md5sum
      loadFSImage(imageFile.getFile(), null, target, recovery, false);
    }
  }
/**
   * Load in the filesystem image from file. It's a big list of
   * filenames and blocks.
   * 加载 fsimage文件,
   */
  private void loadFSImage(File curFile, MD5Hash expectedMd5,
      FSNamesystem target, MetaRecoveryContext recovery,
      boolean requireSameLayoutVersion) throws IOException {
    // BlockPoolId is required when the FsImageLoader loads the rolling upgrade
    // information. Make sure the ID is properly set.
    target.setBlockPoolId(this.getBlockPoolID());
    //一个持有FSNamesystem 以及 conf 对象的loader,加载器嘛,为什么持有FSNamesystem?
    //因为FSNamesystem 是管理元数据的核心组件,最终的元数据都是存储在FSNamesystem 中对把
    FSImageFormat.LoaderDelegator loader = FSImageFormat.newLoader(conf, target);
    loader.load(curFile, requireSameLayoutVersion);

    // Check that the image digest we loaded matches up with what
    // we expected
    MD5Hash readImageMd5 = loader.getLoadedImageMd5();
    if (expectedMd5 != null &&
        !expectedMd5.equals(readImageMd5)) {
      throw new IOException("Image file "   curFile  
          " is corrupt with MD5 checksum of "   readImageMd5  
          " but expecting "   expectedMd5);
    }

    long txId = loader.getLoadedImageTxId();
    LOG.info("Loaded image for txid "   txId   " from "   curFile);
    lastAppliedTxId = txId;
    storage.setMostRecentCheckpointInfo(txId, curFile.lastModified());
  }

其实知道这大家也就知道了其实就是实例化了一个loader 组件去加载fsImage文件了,底层就是基于文件流加载咯

接下就是加载edits log 以及进行一个合并了入口就是 

 //执行加载edits log 其实里面也包含了合并 fsImage 与 edits log 的操作
      long txnsAdvanced = loadEdits(editStreams, target, startOpt, recovery);

这里大家也可以发现其实也是将我们的editStream 目录 以及 FSNamesystem 作为了参数传入,底层说不定也是基于loader 加载edits log 然后放入到FSNamesystem中对吧,当然 肯定也会有一些其他的操作,比如检查,筛选对吧

 


  /**
   * Load the specified list of edit files into the image.
   * 加载指定的edit files 文件到image中
   * 怎么加载 就是再跑一边记录的指令咯
   */
  public long loadEdits(Iterable<EditLogInputStream> editStreams,
      FSNamesystem target) throws IOException {
    return loadEdits(editStreams, target, null, null);
  }

  private long loadEdits(Iterable<EditLogInputStream> editStreams,
      FSNamesystem target, StartupOption startOpt, MetaRecoveryContext recovery)
      throws IOException {
    LOG.debug("About to load edits:\n  "   Joiner.on("\n  ").join(editStreams));
    StartupProgress prog = NameNode.getStartupProgress();
    prog.beginPhase(Phase.LOADING_EDITS);

    long prevLastAppliedTxId = lastAppliedTxId;
    try {
      //实例化一个loader 加载组件,也是和加载FSImage 文件一样,持有了 FSNamesystem
      FSEditLogLoader loader = new FSEditLogLoader(target, lastAppliedTxId);

      // Load latest edits
      //加载最新的edits log
      for (EditLogInputStream editIn : editStreams) {
        LOG.info("Reading "   editIn   " expecting start txid #"  
              (lastAppliedTxId   1));
        try {
          //进行文件流的加载对吧,底层不用说 肯定是基于文件流咯
          loader.loadFSEdits(editIn, lastAppliedTxId   1, startOpt, recovery);
        } finally {
          // Update lastAppliedTxId even in case of error, since some ops may
          // have been successfully applied before the error.
          lastAppliedTxId = loader.getLastAppliedTxId();
        }
        // If we are in recovery mode, we may have skipped over some txids.
        if (editIn.getLastTxId() != HdfsConstants.INVALID_TXID) {
          lastAppliedTxId = editIn.getLastTxId();
        }
      }
    } finally {
      FSEditLog.closeAllStreams(editStreams);
      // update the counts
      updateCountForQuota(target.dir.rootDir);
    }
    prog.endPhase(Phase.LOADING_EDITS);
    return lastAppliedTxId - prevLastAppliedTxId;
  }
/**
   * Load an edit log, and apply the changes to the in-memory structure
   * This is where we apply edits that we've been writing to disk all
   * along.
   * 记载edit log,同时将一些更改变化加入到内存结构中,这里其实就是将我们的记录再
   * edits log 中的操作指令再进行一次刷新
   */
  long loadFSEdits(EditLogInputStream edits, long expectedStartingTxId,
      StartupOption startOpt, MetaRecoveryContext recovery) throws IOException {
    StartupProgress prog = NameNode.getStartupProgress();
    Step step = createStartupProgressStep(edits);
    prog.beginStep(Phase.LOADING_EDITS, step);
    fsNamesys.writeLock();
    try {
      long startTime = now();
      FSImage.LOG.info("Start loading edits file "   edits.getName());
      long numEdits = loadEditRecords(edits, false, expectedStartingTxId,
          startOpt, recovery);
      FSImage.LOG.info("Edits file "   edits.getName() 
            " of size "   edits.length()   " edits # "   numEdits 
            " loaded in "   (now()-startTime)/1000   " seconds");
      return numEdits;
    } finally {
      edits.close();
      fsNamesys.writeUnlock();
      prog.endStep(Phase.LOADING_EDITS, step);
    }
  }

到这其实大家也就知道了逻辑了,最后的时候其实在返回一个needToSave ,做什么的我就不说了 

小总结:FSNamesystem 初始化的步骤

1.从配置文件中获取fsimage 以及 edits log 目录

2.构建FSnamesystem,并传入 fsimage 以及 edits log 目录以及其他信息

3.loadfromdisk 加载 fsimage 以及edits log 文件到内存

4.将fsimage 和 edits log 文件进行合并成新的fsimage,也就是完整的元数据信息,在内存中持有一份

5.将新的完整的fsimage 文件替换 旧的fsimage 文件

6.开启一个新的edits log 文件记录增量事务

 

 

来源:https://www./content-1-823401.html

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多